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Nucleic Acids Containing Single Nucleotide 
Polymorphisms and Methods of Use Thereof 

Background of the iNVEimoN 

Sequence polymoiphism-based analysis of nucleic acid sequences can augment or 
replace previously kno^vn methods for detennining the identity and relatedness of 
individuals. The approach is generally based on alterations in nucleic acid sequences 
between related individuals. This analysis has been widely used in a variety of genetic, 
diagnostic, and forensic applications. For example, polymorphism analyses are used in' 
identity and paternity analysis, and in genetic mapping studies. 

One such type of variation is a restriction fiagment length polymorphism (RFLP). 
RFLPS can create or delete a recognition sequence for a restriction endonuclease in one 
nucleic acid relative to a second nucleic acid. Hie result of the variation is an alteration in die 
relative length of restriction enzyme generated DNA ftagments m the two nucleic acids. 

Other polymorphisms take the form of short tandem repeats (STR) sequences, which 
are also referred to as variable numbers of tandem repeat (VNTR) sequences. STR sequences 
typicaUy mclude tandem repeats of 2, 3, or 4 nucleotide sequences that are present in a 
nucleic acid fiom one individual but absent ftom a second, related individual at the 
corresponding genomic location. 

Other polymorphisms take the form of single nucleotide variations, termed single 
nucleotide polymorphisms (SNPs). between individuals. A SNP can, in some instances be 
referred to as a "cSNP" to denote that the nucleotide sequence containing the SNP originates 
as a cDNA. 

SNPs can arise in several ways. A single nucleotide polymorphism may arise due to a 
substrtution of one nucleotide for another at the polymorphic site. Substitutions can be 
transitions or transversions. A transition is the rq,lacement of one purine nucleotide by 
another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the 
replacement of a purine by a pyrimidme, or the converse. 

Single nucleotide polymorphisms can also arise fiom a deletion of a nucleotide or an 
insertion of a nucleotide relative to a reference allele. ITaus, the polymorphic site is a site at 
vvhich one allele bears a gap with respect to a single nucleotide in another allele. Some SNPs 
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occur within, or near genes. One such class includes SNPs falling within regions of genes 
encoding for a polypeptide product These SNPs may result in an alteration of the amino acid 
sequence of the polypeptide product and give rise to the expression of a defective or other 
variant protein. Such variant products can, in some cases result in a pathological condition, 
5 e.g,^ genetic disease. Examples of genes in which a polymorphism within a coding sequence 
gives rise to genetic disease include sickle cell anemia and cystic fibrosis. Other SNPs do not 
result in alteration of the polypeptide product. Of course, SNPs can also occur in noncoding 
regions of genes. 

SNPs tend to occur with great frequency and are spaced imiformly throughout the 
10 genome. The frequency and uniformity of SNPs means that there is a greater probability that 
such a polymorphism wdll be found in close proximity to a genetic locus of interest. 

Summary of the Invention 

The invention is based in part on the discovery of novel single nucleotide 
polymorphisms (SNPs) in regions of human DNA. 

15 Accordingly, in one aspect,- the invention provides an isolated polynucleotide which 

includes one or more of the SNPs described herein. The polynucleotide can be, e,g,, a 
nucleotide sequence which includes one or more of the polymorphic sequences shown in 
Table 1 and the Sequence Listing (SEQ ID NOS: 1 - 7867) and which includes a polymorphic 
sequence, or a fragment of the polymorphic sequence, as long as it includes the polymorphic 

20 site. The polynucleotide may alternatively contain a nucleotide sequence which includes a 
sequence complementary to one or more of the sequences (SEQ ID NOS: 1-7867), or a 
fragment of the complementary nucleotide sequence, provided that the fragment includes a 
polymorphic site in the polymorphic sequence. 

The polynucleotide can be, e,g. , DNA or RNA, and can be between about 1 0 and 
25 about 100 nucleotides, e.g 10-90, 10-75, 10-51, 10-40, or 10-30, nucleotides in length. 

In some embodiments, the polymorphic site in the polymorphic sequence includes a 
nucleotide other than the nucleotide listed in Table 1, colunm 5 for the polymorphic 
sequence, e,g., the polymorphic site includes the nucleotide listed in Table 1, column 6 for 
the polymorphic sequence. 

30 
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In Other embodiments, the complement of the polymorphic site includes a nucleotide 
other than the complement of the nucleotide listed in Table 1, column 5 for the complement 
of the polymorphic sequence, e.g.. the complement of the nucleotide listed in Table 1, 
column 6 for the polymorphic sequence. 

In some embodiments, the polymorphic sequence is associated with a polypeptide 
related to one of the protein famiUes disclosed herein. For example, the nucleic acid may be 
associated with a polypeptide related to an ATPase associated protein, a cadherin, or any of 
the other proteins identified in Table 1, column 10. 

In another aspect, the invention provides an isolated allele-specific oligonucleotide 
that hybridizes to a first polynucleotide contaming a polymorphic site. The first 
polynucleotide can be, e.g., a nucleotide sequence comprising one or more polymorphic 
sequences (SEQ ID NOS. l - 7867), provided that the polymorphic sequence includes a 
nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic 
sequence. Alternatively, the first polynucleotide can be a nucleotide sequence that is a 
firagment of the polymorphic sequence, provided that the fragment includes a polymorphic 
site m the polymorphic sequence, or a complementary nucleotide sequence which mcludes a 
sequence complementary to one or more polymorphic sequences (SEQ ID NOS: 1 - 7867), 
provided that the complementary nucleotide sequence includes a nucleotide other than the' 
complement ofthe nucleotide recited in Table l,column5. The first polynucleotide may in 
addition include a nucleotide sequence that is a fiagment ofthe complementary sequence, 
provided that the fragment includes a polymorphic site in the polymorphic sequence. 

In some embodiments, the oUgonucleotide does not hybridize under stringent 
conditions to a second polynucleotide. The second polynucleotide can be, e.g., (a) a 
nucleotide sequence comprismg one or more polymorphic sequences (SEQ ID N0S:1 - 
7867), wherein the polymorphic sequence includes the nucleotide listed in Table 1, column 5 
for the polymorphic sequence; (b) a nucleotide sequence that is a fragment of any ofthe 
polymorphic sequences; (c) a complementary nucleotide sequence including a sequence 
complementary to one or more polymorphic sequences (SEQ ID N0S:1 - 7867), wherein the 
polymorphic sequence includes the complement ofthe nucleotide listed in Table 1, column 5; 
and (d) a nucleotide sequence that is a fragment ofthe complementary sequence, provided ' 
that the fragment includes a polymorphic site in the polymorphic sequence. 
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The oligonucleotide can be, e.g,, between about 10 and about 100 bases in length. In 
some embodhnents, the oligonucleotide is between about 10 and 75 bases, 10 and 51 bases, 
10 and about 40 bases, or about 15 and 30 bases in length. 

The invention also provides a method of detecting a polymorphic site in a nucleic 
5 acid. The method includes contacting the nucleic acid witii an oligonucleotide that hybridizes 
to a polymorphic sequence selected from the group consisting of SEQ ID NOS: 1-7867, or its 
complement, provided that the polymorphic sequence mcludes a nucleotide other than the 
nucleotide recited in Table 1 , column 5 for the polymorphic sequence, or the complement 
includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 
10 5. The method also includes determining whether the nucleic acid and the oligonucleotide 
hybridize. Hybridization of the oligonucleotide to the nucleic acid sequence indicates the 
presence of the polymorphic site in the nucleic acid. 

In preferred embodiments, the oligonucleotide does not hybridize to the polymorphic 
sequence when the polymorphic sequence includes the nucleotide recited in Table 1, column 
15 5 for the polymorphic sequence, or when the complement of the polymorphic sequence 

includes the complement of the nucleotide recited in Table 1, column 5 for the polymorphic 
sequence. 

The oligonucleotide can be, e.g, , between about 10 and about 100 bases in length. In 
some embodiments, the oligonucleotide is between about 10 and 75 bases, 10 and 51 bases, 
20 1 0 and about 40 bases, or about 1 5 and 30 bases in length. 

w 

In some embodiments, the polymorphic sequence identified by the oligonucleotide is 
associated with a polypeptide related to one of the protem families disclosed herein. For 
example, the nucleic acid may be associated polypeptide related to an ATPase associated 
protein, cadherin, or any of the other protein families identified in Table 1, column 10. 

25 In another aspect, the method includes determining if a sequence polymorphism is 

present in a subject, such as a human. The method includes providing a nucleic acid from the 
subject and contacting the nucleic acid with an oligonucleotide that hybridizes to a 
polymorphic sequence selected from the group consisting of SEQ ID NOS: 1-7867, or its 
complement, provided that the polymorphic sequence includes a nucleotide other than the 

30 nucleotide recited in Table 1, column 5 for said polymorphic sequence, or the complement 

includes a nucleotide other than the complement of the nucleotide recited in Table 1, 

4 
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columns. Hybridization between the nucleic acid and the oligonucleotide is then 
determined. Hybridization of the oUgonucIeotide to the nucleic acid sequence indicates the 
presence of the polymorphism in said subject. 



_ ^essof 

a first and second nucleic acid. The method includes providing a first nucleic acid and a 
second nucleic acid and contacting the first nucleic acid and the second nucleic acid with an 
ohgonucleotide that hybridizes to a polymorphic sequence selected from the group consisting 
of SEQ ID NOS: 1-7867. or its complement, provided that the polymorphic sequence 
mcludes a nucleotide other than the nucleotide recited in Table 1, column 5 for the 
polymorphic sequence, or the complement mcludes a nucleotide other than the complement 
of the nucleotide recited in Table 1, cohmm 5. The method also includes detoxmning 
whether the first nucleic acid and the second nucleic acid hybridize to the oligonucleotide 
and comparing hybridization of flie first and second nucleic acids to flie oligonucleotide ' 
Hybridization of first and second nucleic acids to the nucleic acid indicates the first and 
second subjects are related. 

In preferred embodiments, tiae oligonucleotide does not hybridize to the polymorphic 
sequence when tiie polymorphic sequence includes tiie nucleotide recited in Table 1. column 
5 for the polymorphic sequence, or when tiie complement of the polymorphic sequence 
mcludes die complement of the nucleotide recited in Table 1, column 5 for the polymorphic 
sequence. 

The oligonucleotide can be. e.g., between about 10 and about 100 bases in length In 
some embodiments, the ohgonucleotide is between about 10 and 75 bases. 10 and 51 bases, 
10 and about 40 bases, or about 15 and 30 bases in lerigtii. 

The metiiod can be used in a variety of appUcations. For example. ti,e first nucleic 
acid may be isolated fi«m physical evidence gathered at a crime scene, and the second 
nucleic acid may be obtained fix,m a person suspected of having committed the crime 
Matchmg the two nucleic acids using the method can estabUsh whether the physical evidence 
originated from the person. 

In anotiier example, tire first sample may be from a hmnan male suspected of being 
the father of a child and tiie second sample may be from tire child. Establishing a match 
usmg tiie described metirod can establish wheflrer tiie male is flie faflier of tiie child 
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In another aspect, the invention provides an isolated polypeptide comprising a 
polymorphic site at one or more amino acid residues, and wherein the protein is encoded by a 
polynucleotide including one of the polymorphic sequences SEQ ID NOS : 1-7867, or their 
complement, provided that the polymorphic sequence includes a nucleotide other than the 
5 nucleotide recited in Table 1, column 5 for the polymorphic sequence, or the complement 
includes a nucleotide other than the complement of the nucleotide recited in Table 1, colxmm 
5. 

The polypeptide can be, e.g., related to one of the protein families disclosed herein. 
For example, the polypeptide can be related to an ATPase associated protein, cadherin, or any 
10 of the other proteins provided in Table 1 , column 1 0. 

In some embodiments, the polypeptide is translated in the same open reading frame as 
is a wild type protein whose amino acid sequence is identical to the amino acid sequence of 
the polymorphic protein except at the site of the polymorphism. 

In some embodiments, the polypeptide encoded by the polymorphic sequence, or its 
15 complement, includes the nucleotide listed in Table 1 , column 6 for the polymorphic 

sequence, or the complement includes the complement of the nucleotide listed in Table 1 , 
column 6. 

The invention also provides an antibody that binds specifically to a polypeptide 
encoded by a polynucleotide comprising a nucleotide sequence encoded by a polynucleotide 
20 selected from the group consisting of polymorphic sequences SEQ ID NOS:l-7867, or its 
complement The polymorphic sequence includes a nucleotide other than the nucleotide 
recited in Table 1, column 5 for the polymorphic sequence, or the complement includes a 
nucleotide other than the complement of the nucleotide recited in Table 1, column 5. 

In some embodiments, the antibody binds specifically to a polypeptide encoded by a 
25 polymorphic sequence which includes the nucleotide listed in Table 1, column 6 for the 
polymorphic sequence. 

Preferably, the antibody does not bind specifically to a polypeptide encoded by a 
polymorphic sequence which includes the nucleotide listed in Table 1, column S for the 
polymorphic sequence. 
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The invention further provides a method of detecting the presence of a polypeptide 
having one or more amino acid residue polymorphisms in a subject TTie method includes 
providing a protein sample from the subject and contacting the sample with the above- 
described antibody under conditions that allow for the formation of antibody-antigen 
complexes. The antibody-antigen complexes are then detected. The presence of the 
complexes indicates the presence of the polypeptide. 

The invention also provides a method of treating a subject suffering from, at risk for, 
or suspected of, suffering from a pathology ascribed to the presence of a sequence 
polymorphism in a subject, e.g.. a human, non-human primate, cat, dog, rat, mouse, cow, pig, 
goat, or rabbit. The method includes providing a subject sufiFering from a pathology 
associated with aberrant expression of a first nucleic acid comprising a polymorphic sequence 
selected from the group consisting of SEQ ID NOS. l - 7867, or its complement, and treating 
the subject by administering to the subject an effective dose of a therapeutic agent Aberrant 
expression can include qualitative alterations in expression of a gene, e.g., expression of a 
gene encodmg a polypeptide having an altered amino acid sequence with respect to its wUd- 
type counterpart. Qualitatively different polypeptides can mclude, shorter, longer, or altered 
polypeptides relative to the amino acid sequence of the wild-type polypeptide. Aberrant 
expression can also include quantitative alterations in expression of a gene. Examples of 
quantitative alterations in gene expression include lower or higher levels of expression of the 
gene relative to its wild-type counterpart, or alterations in the temporal or tissue-specific 
expression pattern of a gene. Finally, aberrant expression may also include a combination of 
qualitative and quantitative alterations in gene expression. 

The flierapeutic agent can be administered to a subject suffering from a pathology 
associated with aberrant expression of a first nucleic acid comprising a polymorphic 
sequence. The therapeutic agent can mclude, e.g., second nucleic acid comprising the 
polymorphic sequence, provided that the second nucleic acid comprises the nucleotide 
present in the wild type allele. In some embodiments, the second nucleic acid sequence 
comprises a polymorphic sequence which includes the nucleotide listed in Table 1 , column 5 
for the polymorphic sequence. 

Alternatively, the therapeutic agent can be a polypeptide encoded by a polynucleotide 
comprising a polymorphic sequence selected from the group consisting of SEQ ID N0S:1 - 
7867, or by a polynucleotide comprising a nucleotide sequence that is complementary to any 



t 



wo 01/47944 PCT/USOO/35498 

one of the polymorphic sequences SEQ ID NOS: 1 - 7867, provided that the polymorphic 
sequence includes the nucleotide listed in Table 1 , column 6 for the polymorphic sequence. 

The therapeutic agent may further include an antibody as herein described, or an 
oligonucleotide comprising a polymorphic sequence selected fiom the group consisting of 
5 SEQ ID NOSrl - 7867, or by a polynucleotide comprising a nucleotide sequence that is 
complementary to any one of polymorphic sequences SEQ ID N0S:1 - 7867, provided that 
the polymorphic sequence includes the nucleotide listed in Table 1, column 5 or Table 1, 
column 6 for the polymorphic sequence. 

In another aspect, the invention provides an oligonucleotide array comprising one or 
10 more oligonucleotides hybridizing to a first polynucleotide at a polymorphic site 

encompassed therein. The first polynucleotide can be, e.g., a nucleotide sequence comprising 
one or more polymorphic sequences (SEQ ID N0S:1 - 7867); a nucleotide sequence that is a 
firagment of any of the nucleotide sequences, provided that the fi-agment includes a 
polymorphic site in the polymorphic sequence; a complementary nucleotide sequence 
15 comprising a sequence complementary to one or more polymorphic sequences (SEQ ID 
N0S:1 - 7867); or a nucleotide sequence that is a fragment of the complementary sequence, 
provided that the firagment includes a polymorphic site in the polymorphic sequence. 

In preferred embodiments, the array comprises 10; 100; 1,000; 10,000; 100,000 or 
more oligonucleotides. 

20 The invention also provides a kit comprising one or more of the herein-described 

nucleic acids. The kit can include, e.g., a polynucleotide which includes one or more of the 
SNPs described herein; The polynucleotide can be, e.g., a nucleotide sequ^ce which 
includes one or more of the polymorphic sequences shown in Table 1 and the Sequence 
Listing (SEQ ID NOS: 1 - 7867) and which includes a polymorphic sequence, or a firagment 

25 of the polymorphic sequence, as long as it includes the polymorphic site. The 

polynucleotide may alternatively contain a nucleotide sequence which includes a sequence 
complementary to one or more of the sequences (SEQ ID NOS: 1-7867), or a fragment of the 
complementary nucleotide sequence, provided that the firagment includes a polymorphic site 
in the polymorphic sequence. The invention provides an isolated allele-specific 

30 oligonucleotide that hybridizes to a first polynucleotide containing a polymorphic site. The 
first polynucleotide can be, e.g., a nucleotide sequence comprising one or more polymorphic 
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sequences (SEQ ID NOS.l - 7867), provided that the polymorphic sequence includes a 
nucleotide other than the nucleotide recited in Table 1. column 5 for the polymorphic 
sequence. Alternatively, the first polynucleotide can be a nucleotide sequence that is a 
fragment of the polymorphic sequence, provided that the fragment includes a polymorphic 
site in the polymorphic sequence, or a complementary nucleotide sequence which includes a 
sequence complementary to one or more polymorphic sequences (SEQ ID N0S:1 - 7867), 
provided that the complementary nucleotide sequence mcludes a nucleotide other than the' 
complement of the nucleotide recited in Table 1. column 5. The first polynucleotide may in 
addition include a nucleotide sequence that is a fragment of the complementary sequence, 
provided that the fragment includes a polymorphic site in the polymorphic sequence. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skiU in the art to which this invention 
belongs. Although methods and materials similar or equivalent to those described herein can 
be used in the practice or testing of the present invention, suitable methods and materials are 
described below. AU publications, patent applications, patents, and other references 
mentioned herein are incorporated by reference m their entirety. In the case of conflict, the 
present specification, including definitions, will control. In addition, the materials, mediods, 
and examples are illustrative only and not intended to be limiting. 

Other features and advantages of the invention will be apparent from the following 
detailed description and claims. 



Detailed Description of the Invention 

The mvention provides human SNPs m sequences which are transcribed, i.e., are 
cSNPs. Many SNPs have been identified in genes related to polypeptides of known fimction. 
If desired, SNPs associated with various polypeptides can be used together. For example, 
SNPs can be grouped according to whether they are derived from a nucleic acid encoding a 
polypeptide related to particular protein family or involved in a particular fimction. 
SimUarly, SNPs can be grouped according to the fimctions played by their gene products. 
Such functions include, structural proteins, proteins which are associated with metabolic 
pathways, including fatty acid metabolism, glycolysis, intermediary metabolism, calcium 
metabolism, proteases, and amino acid metabolism, etc. Specifically, the present invention 
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provides a large number of human cSNP's based on at least one gene product that has not 
been previously identified. In contrast, and as defined specifically in the following 
paragraph, the cSNP's involve nucleic acid sequences that are assembled firom at least one 
known sequence. 

5 The present invention provides a large number of himian cSNP's based on at least one 

gene product that has not been previously identified. In contrast, and as defined specifically 
in the following paragraph, the cSNP's involve nucleic acid sequences that are assembled 
fix)m at least one known sequence. 

7867 distinct polymorphic sites were identified by the present inventors, using the 
10 following procedure. Raw traces underlying sequence data were drawn from public 

databases and from the proprietary database of the Assignee of the present invention. The 
sequences were obtained by calling the bases fi'om these traces, and included assigning 
"Phred'' quality scores for each called base. For each allelic set, at the polynucleotide level, 
four or more nucleotide sequences were identified having at least partial overlap with one 
15 another. 

As illustrated in FIG. 1, these four or more sequences could be clustered and 
assembled to make a consensus contig that included an ORF. In this way, the inventors 
found that the assembled contigs defined associated sets of two, or possibly more than two, 
alleles defmed by an SNP at a particular polymorphic site. In order to be confirmed as a SNP 

20 site, the nucleotide change from the consensus sequence had to occur in at least two 
individual sequences, and had to have a "Phred" score of 23 or higher at the site of the 
presimied SNP. Furthermore, m a window of 5 bases on either side of the SNP, no more than 
50% mismatching with the consensus sequence was allowed. In the assembly leading to each 
of the contigs defining the allelic set, the SNP alleles occur in polynucleotides found in public 

25 databases. Furthermore, it was found that the assembled contigs defined associated sets of 
two, or possibly more than two, alleles defmed by an SNP at a particular polymorphic site. 
These associations were not previously known. The SNPs are presented in Table 1 . 

At the level of translation of an ORF contained in the contigs, however, the inventors 
identified allelic sets in which one allele defines a known polypeptide sequence that includes 
30 the polymorphic site and another polypeptide allele is not previously known. Then, various 
associations of alleles are possible. For example, it is possible that an allelic pair is defined 
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in a noncoding region of the contig containing an ORF. In such cases the inventors believe 
that the invention resides in the recognition of the alleUc pair; this association has not 
heretofore been made. Alternatively, sets of allelic contigs may exist in which the 
polymorphic site is vdthin an ORF, but does not result in an amino acid change among the 
aUelic polypeptides. Here too it is believed that the invention resides in the recogmtion of the 
allelic pair; and that this association has not heretofore been made. In yet another alternative, 
the polymo^jhic site resides within an ORF and results in an amino acid change, or a 
fiameshift, among the alleles of the aUelic set. In the sets of gene products that fell within 
this group, at least one of the alleles at the polypeptide level is a known protein. At least one 
of the remaining allele or alleles in the set, carrying a variant amino acid at the polymorphic 
site, is a novel polypeptide not heretofore known. The invention resides at least in the 
recognition of the polymorphic allele as being a variant of the known reference polypeptide. 

Table 1 provides infonnation concerning the allelic sequences. One of the sequences 
may be termed a reference polymorphic sequence, and the corresponding second sequence 
includes the variant SNP at the polymorphic site. Smce the reference polypeptide sequence is 
already known, the Sequence Listing accompanying this application provides only the 
sequence of the polymorphic allele, while its SEQ ID NO is provided in the Table. A 
reference to the SEQ ID NO that corresponds to the translated amino add sequence is also 
given. The Table includes thirteen columns that provide descriptive information for each 
cSNP, each of which occupies one row in the Table. The column headings, and a description 
of each, are given below. 

SNPs disclosed in Table 1 were detected by aligning large nmnbers of sequences fiom 

genetically diverse sources of pubUcly available mRNA libraries (Clontech). Software 

designed specificaUy to look for multiple examples of variant bases differing from a 

consensus sequence was created and deployed. A criteria of a minimum of 2 occmxences of a 

sequ^ce differing from the consensus in high quality sequence reads was used to identify an 
SNP. ' 

The SNPs described herein may be usefid in diagnostic kits, for DNA arrays on chips 
and for other uses that involve hybridization of the SNP. 
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^® u w.«r1vheen associated with that 

specific SNPs .ay have utility where a disease has ^^^^^ .embers of 
gene. Examples of possible disease conelations between the clanned SNPs wxth 
the genes of each classification are listed below: 

Amylases 

deUyed maturation and of variom amylase prod«^ ooplasms ami 
Amyloid 

a.yloidA(SAA)p«>.cm.compi,oa6milyo£v«.ebta.ep^.cin3fta. 



,j^ly is greatly inB««dnimilammal»n.Prolonge 
SAAlevela.3sinctaoMcinflamn^o,.15,«nl,sinapa.ho,ogica,co„a...o._«^^ 



members of the 



^ insol* aecomnlation of SAA in ^ ^^^J^ 
with type 11 diabetes mellitus. 



20 Angiopoeitin 

of the an^opoeitin/fibrirrogen «1, l^ve been ^ <o ^^^^ 
.ene^onofr^^blo^veaaeM^bittbegene^ono.^^ 
^eralro.eainb.oode,o..in,.™s.~o^a^2 v^^^ 

^ an ess^^al s.^ in " i leZ fl of hear, disea.. nnme.,. 

^ to expand Variation in Orese may be P-^""^^ ,^„„ ^ 

— r "TX^-^eresponseto^^ 
metastasis. In particnlar. these variants may be preaicov 

anShypert^sive drugs »«i ctoottarap-* and anti-tumor ag^ts. 
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eu-o™ a«,^p,„«3 preveBtfag their la.set-«Us fe,m dying V.™ , , 



CoJony-stuniiiating factor-related 



proteins 



a— <K:j*,„^ph3,, colony-stouladng fectes cytokine, fta, « i„ 

ons of the blood, the granulocytes and the monocytes-macrophages. 
Complement-related proteins 

^-^''"-'P-«"»«immuneassociaWcytotoxicage„^actf„gi.ac^ 
reaction to exterminate tareetcelktr»tj,ot . . -^^ung m a ciiam 

fo™- . opsonized (primed) xvith antibodies bv 

formmg a membrane attack comnlev f^AA r^ tl . <«iuooaies, by 

aiu«.B. complex (MAC). The mechanism of killino Jc k., « • 
in the target cell membrane Variations in 90 , ^ 
associated with complement genes or their inhibitors are 

associated xvith many autommiune disorders. Modified senzm levels of .n , 

Phagocytic ab.i..y. of c«.ple„enl genes .nay aJso be indicad., of type I 
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diabetes mellitus, meningitis neurological disorders such as Nemaline myopathy. Neonatal 
hypotonia, muscular disorders such as congenital myopathy and other diseases. 

Cytochrome 

The respiratory chain is a key biochemical pathway which is essential to all aerobic 
5 cells. There are five di£ferent c)^ochromes involved in the chain. These are heme bound 
proteins which serve as electron carriers. Modifications in these genes may be predictive of 
ataxia areflexia, dementia and myopathic and neuropathic changes in muscles. Also, 
association with various types of solid tumors. 

Kinesins 

10 Kinesins are tubulin molecular motors that function to transport organelles within 

cells and to move chromosomes along microtubules during cell division. Modifications of 
these genes may be indicative of neurological disord^s such as Pick disease of the brain, 
tuberous sclerosis. 

Cytokines, Interferon, Interleukin 

15 • Members of the cytokine families are known for their potent ability to stimulate cell 

growth and division even at low concentrations. Cytokines such as erythropoietin are 
cell-specific in their growth stimulation; erythropoietin is useful for the stimulation of the 
proliferation of erythroblasts. Variants in cytokines may be predictive for a wide variety of 
diseases, including cancer predisposition. 

20 G-protein coupled receptors 

G-protein coupled receptors (also called R7G) are an extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by 
interaction with guanine nucleotide-binding (G) proteins. Alterations in genes coding for 
G-coupled proteins may be involved in and indicative of a vast number of physiological 
25 conditions. These include blood pressure regulation, renal dysfunctions, male infertility, 
dopamine associated cognitive, emotional, and endocrine functions, hypercalcemia, 
chondrodysplasia and osteoporosis, pseudohypoparathyroidism, growth retardation and 
dwarfism. 
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Eukaiyotic thiol proteases are a familv of 
nearK« k;„*:j- ^ ™*ennediate and IS facilitated bv a 

embedded in a polymoxphic sequence The . ' '^^P^^^^^^hicsite 
0 . ^ ^^P°^y™oq>hic site is occupied by a single 

uCToiea m 1 able 1 , column 5 at the polymorohic «ati. in f ho 
polymorphic sequence Anevamnio . I'y'ymoipmc site m the 

proviaes only the sequence of the polymomkic allele- its SEO m Mn ■ . 

in the Tabte 1. A „fe,e.ce tol SEO m ^ " " ™" 

-.-.ce is also given if app^pHate CabT T TT"" '^'^'^^ 
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a«a SEQ ID NOs- as ^117*7, ""^ " 

Os.as»ell.„„he Seq.»« Listing of the applicati™. Co„,e«ely. each 
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sequence entry in the Sequence Listing also includes a cross-reference to the CuraGen 
sequence ID, under the label "CuraGen sequence ID". The first SEQ ID NO: given in the 
first column of each row of the Table is the SEQ ID NO: identifying the nucleic acid 
sequence for the polymorphisms. If a polymorphism carries an entry for an amino acid in a 
5 coding region, then a second SEQ ID NO: appears in parentheses in the column ''Amino acid 
after" (see below) for the polymorphic amino acid sequence . The latter SEQ ID NOs: refer 
to amino acid sequences giving the polymorphic amino acid sequences that are the translation 
of the nucleotide polymorphism. If a polymorphism carries no entry for the protein portion 
of the row, only one SEQ ID NO: is provided, in the first column. 

"Base pos. of SNP" gives the numerical position of the nucleotide in the nucleic acid 
at which the cSNP is found, as identified in this invention. 

"Polymorphic sequence" provides a 51-base sequence with the polymorphic site at the 
26 base in the sequence, as well as 25 bases fi'om the reference sequence on the 5' side and 
the 3' side of the polymorphic site. The designation at the polymorphic site is enclosed in 
square brackets, and provides first, the reference nucleotide; second, a "slash (/)"; and third, 
the polymorphic nucleotide. In certain cases the polymorphism is an insertion or a deletion. 
In that case, the position which is 'hmfilled" (i.e., the reference or the polymorphic position) 
is indicated by the word "gap". 

"Base before" provides the nucleotide present in the reference sequence at the 
20 position at which the polymorphism is found. 

"Base after" provides the altered nucleotide at the position of the polymorphism. 

"Anuno acid before" provides the amino acid in the reference protein, if the 
polymorphism occurs in a coding region. 

"Amino acid after^' provides the amino acid in the polymorphic protein, if the 
25 polymorphism occurs.in a coding region. This column also includes the SEQ ID NO: in 
parentheses for the translated polymorphic amino acid sequence if the polymorphism occurs 
in a coding region. 

"Type of change" provides information on the nature of the polymorphism. 

16 
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"SILENT-NONCODING" is used if the polymorphism occurs in a noncoding 
region of a nucleic acid. 

-SILENT-CODING'' is used if die polymon,hism occurs in a coding region of 
a nucleic acid of a nucleic acid and results in no change of amino acid in the 
translated polymorphic protein. 



'CONSERVAnVE" is used if the polymorphism occurs in a coding region of 
a nucleic acid and provides a change in which the altered amino add falls in 
the same class as the reference amino acid. The classes are: 

Aliphatic: Gly, Ala, Val, Leu, lie; 

Aromatic: Phe, Tyr, Tip; 

Suliiir-containing: Cys, Met; 

Aliphatic OH: Ser, Thr; 

Basic: Lys, Arg, His; 

Acidic: Asp, Glu, Asn, Gin; 

Pro falls in none of the other classes; and 

End defines a termination codon. 

"NONCONSERVATIVE" is used if the polymorphism occurs in a coding 
region of a nucleic acid and provides a change in Avhich the altered amino acid 
falls in a different class than the reference amino add. 

"FRAMESHIFr' relates to an insertion or a deletion. If the frameshift occurs 
m a coding region, the Table provides the translation of the frameshifted 
codons 3' to the polymoiphic site. 

"Protein classification of CuraGen gene" provides a generic class into which the 
protem is classified. During the course of the work leading to the filing of the four 
applications identified above, approximately 100 classes of proteins were identified 
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"Name of protein identified following a BLASTX analysis of the CuraGen sequence" 
provides the database reference for the protein found to resemble the novel reference- 
polymorphism cognate pair most closely. (The next paragraph explains how a sequence was 
determined to be "novel"). 

5 "Similarity (pvalue) following a BLASTX analysis" provides the pvalue, a statistical 

measure from the BLASTX analysis that the polymorphic sequence is similar to, and 
therefore an allele of, the reference, or wild-type, sequence. In the present application, a 
cutoff of pvalue > 1 x 10"^^ (entered, for example, as 1 .OE-50 in the Table) is used to establish 
that the reference-polymorphic cognate pairs are novel. 

10 "Map location" provides any information available at the time of filing related to 

localization of a gene on a chromosome. 

The polymorphisms are arranged in the Table in the following order. 

SEQ ID NOs: 1-5696 are nucleotide sequences for SNPs that are silent. 

SEQ ID NOs: 5697-601 1 are nucleotide sequences for SNPs that lead to conservative 
15 amino acid changes. 

SEQ ID NOs: 6012-6740 are nucleotide sequences for SNPs that lead to 
nonconservative amino acid changes. 

SEQ ID NOs: 6741-7867 are nucleotide sequences for SNPs that involve a gap. With 
respect to the reference or wild-type sequence at the position of the polymorphism, the allelic 
20 cSNP introduces an additional nucleotide (an insertion) or deletes a nucleotide (a deletion). 
An SNP that involves a gap generates a frame shift. 

SEQ ID NOs: 7868-8182 are the amino acid sequences centered at the polymorphic 
amino acid residue for the protein products provided by SNPs that lead to conservative amino 
acid changes. 7 or 8 amino acids on either side of the polymorphic site are shown. The order 
25 in which these sequences appear mirrors the order of presentation of the cognate nucleotide 
sequences, and is set forth in the Table. 

SEQ ID NOs: 81 83-891 1 are the amino acid sequences centered at the polymorphic 
amino acid residue for the protein products provided by SNPs that lead to nonconservative 



18 



wo 01/47944 

PCTAJSOO/35498 

amino acid changes. 7 or 8 amino acids on either side of the polymbiphic site are shown. 
The order in which these sequences appear mirrors the order of presentation of the cognate 
nucleotide sequences, and is set forth in the Table. 

SEQ ID NOs: 8912-10038 are the amino acid sequences centered at the polymorphic 
amino acid residue for the protem products provided by SNPs that lead to frameshift-induced 
amino acid changes. 7 or 8 amino acids on either side of the polymorphic site are shown. Hie 
order in which these sequences appear mirrors the order of presentation of the cognate 
nucleotide sequences, and is set forth in the Table. 



Provided herein are compositions which include, or are capable of detecting, nucleic 
acid sequences having these polymoiphisms, as weU as methods of using nucleic adds. 

Identification of Individuals Carrving SNPs 

Individuals carrying polymorphic alleles of the invention may be detected at either the 
DNA, the RNA, or the protem level using a variety of techniques that are well known in the 
art Strategies for identification and detection are described in e.g.. EP 730,663, EP 717,1 13, 
and PCT US97/02102. ITre present methods usuaUy employ pre-characteriled ' 
polymorphisms. That is. the genotyping location and nature of polymorphic forms present at 
a site have already been detennined. Hie availability of this information allows sets of • 
probes to be designed for specific identification of the known polymorphic forms. 

Many of the methods described below require amplification of DNA fiom target 
samples. This can be accomplished by e.g., PCR. See generally PCR Technology: Principles 
and Applications for DNA Amplification (ed. H.A. ErUch, Freeman Press, NY, NY, 
1992); PCR Protocols: A Guide to Methods and AppUcations(eds. Innis, et al.. Academic 
Press, San Diego, CA, 1990); Mattila et al.. Nucleic Acids Res. 19. 4967 (1991); Eckert et 
al.. PCR Methods and Applications 1, 17(1991); PCR (eds. McPhersonetal.. IRL Press. 
Oxford); and U.S. Patent 4,683,202. 

The phrase "recombinant protein" or "recombmantly produced protein" refers to a 

peptide or protein produced using non-native cells that do not have an endogenous copy of 

DNA able to express the protein. In particular, as used herein, a recombinantly produced 

protein relates to the gene product of a polymorphic allele, i.e., a "polymoiphic protein- 
IP 
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containing an altered amino acid at the site of translation of the nucleotide polymorphism. 
The cells produce the protein because they have been genetically altered by the introduction 
of the appropriate nucleic acid sequence. The recombinant protein will not be found in 
association with proteins and other subcellular components normally associated with the cells 
5 producmg the protein. The terms "protein" and "polypeptide" are used interchangeably 
herein. 

The phrase "substantially purified" or "isolated" when referring to a nucleic acid, 
peptide or protein, means that the chemical composition is in a milieu containing fewer, or 
preferably, essentially none, of other cellular components with which it is naturally 

10 associated. Hius, the phrase "isolated" or "substantially pure" refers to nucleic acid 

preparations that lack at least one protein or nucleic acid normally associated with the nucleic 
acid in a host cell. It is preferably in a homogeneous state although it can be in either a dry or 
aqueous solution. Purity and homogeneity are typically determined using analytical 
chemistry techniques such as gel electrophoresis or high performance liquid chromatography. 

15 Generally, a substantially purified or isolated nucleic acid or protein will comprise more than 
80% of all macromolecular species present in the preparation. Preferably, the nucleic acid or 
protein is purified to represent greater than 90% of all macromolecular species present. More 
preferably the nucleic acid or protein is purified to greater than 95%, and most preferably the 
nucleic acid or protein is purified to essential homogeneity, wherein other macromolecular 

20 species are not detected by conventional analytical procedures. 

The genomic DNA used for the diagnosis may be obtained fi:om any nucleated cells 
of the body, such as those present in peripheral blood, urine, saliva, buccal samples, surgical 
specimen, and autopsy specimens. The DNA may be used directly or may be amplified 
enzymatically in vitro through use of PGR (Saiki et al. Science 239:487-491 (1 988)) or other 
25 in vitro amplification methods such as the ligase chain reaction (LCR) (Wu and Wallace 
Genomics 4 :560-569 (1989)), strand displacement amplification (SDA) (Walker et al. Proc. 
Natl. Acad. Sci. U.S.A. 89:392-396 (1992)), self-sustained sequence replication (3SR) 
(Fahy et al. PGR Methods P&J& 1:25-33 (1992)), prior to mutation analysis. 

The method for preparing nucleic acids in a form that is suitable for mutation 
30 detection is well known in the art. A "nucleic acid" is a deoxyribonucleotide or 

ribonucleotide polymer in either single-or double-stranded form, including known analogs of 
natural nucleotides unless otherwise indicated. The term "nucleic acids", as used herein, 

20 
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refers to either DNA or RNA. "Nucleic acid sequence" or "polynucleotide sequence" refers 
to a single-stranded sequence of deoxyribonucleotide or ribonucleotide bases read from the 5' 
end to the 3' end. The direction of 5' to 3' addition of nascent RNA transcripts is refened to 
as the transcription direction; sequence regions on the DNA strand having the same sequence 
as the RNA and which are beyond the 5' end of the RNA transcript in the 5' direction are 
refened to as "upstream sequences"; sequence regions on the DNA strand having the same 
sequence as the RNA and vMch arc beyond the 3' end of the RNA transcript in the 3' 
direction are refened to as "downstream sequences". The term includes both self-replicating 
plasmids, infectious polymers of DNA or RNA and nonfunctional DNA or RNA. The 
complement of any nucleic acid sequence of the invention is understood to be mcluded in the 
definition of that sequence. "Nucleic acid probes" may be DNA or RNA fragments. 

nie detection of polymorphisms in specific DNA sequences, can be accomplished by 
a vanety of methods including, but not limited to. restriction-fiagment-lengtb-polymorphism 
detection based on allele-specific restriction-endonuclease cleavage (Kan and Dozy Lancet 
u:910-912 (1978)), hybridization with allele-specific oligonucleotide probes (Wallace et al 
Nucl. Acids Res. 6:3543-3557 (1978)), including immobilized oligonucleotides (Saiki et al 
Proc. Natl. Acad SPT TTS^ 86:6230-6234 (1969)) or oligonucleotide arrays (Maskos and 
Southern MucLAddsRe^21 :2269-2270 (1993)), allele-specific PGR (Newton et al H«cl 
AcidsRes 17:2503-2516 (1989)), mismatch-repair detection (MRD) (Faham and Cox 
fi«Re^5:474-482(1995)),bindingofMutSprotein(Wagneretal. NucUcids^ 
23:3944-3948 (1995), denaturing-gradient gel electrophoresis (DGGE) (Fisher andLerman et 
al. Proc. Natl Acad ScL as.A. 50:i57P-/ 583 (1983)). single-strand-conformation- 
polymorphism detection (Orita et al. Genomics 5:874-879 (1983)). RNAase cleavage at 
nrismatched base-pairs (Myers etal. Sd^230:1242 (1985)). chemical (Cotton etal Proc 
Natl. wSci. U.S.A, 824397-4401 (1988)) or enzymatic (Youiletal. Proc. Natl. Acad. Sci. 
iLS^92:87-91 (1995)) cleavage of heteroduplex DNA. methods based on allele specific 
pnmer.extension (Syvanen et al. 0^^8:684-692 (1990)). genetic bit analysis (GBA) 
(Nrkiforov et al. &&I Md^22:4167-4175 (1994)). the oligonucleotide-ligation assay (OLA) 
(Landegren et al. Science 241:1077 (1988)). the allele-specific ligation chain reaction (LCR) 
(Barrany Proc. Natl. Acad. Sci. iLSA:_88:189^193 (1991)),gap.LCR(Ahravayaetal 
NuclAcidsRe^23:675-682 (1995)). radioactive and/or fluorescent DNA sequencing using 
standard procedures well known in the art. and peptide nucleic acid (PNA) assays (Oram et 
al.. Nucl. Acids Res. 21:5332-5356 (1993); Thiede et al..Nucl. AciH. 24:983-984 
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(1996)). 

"Specific hybridization" or "selective hybridization" refers to the binding, or 
duplexing, of a nucleic acid molecule only to a second particular nucleotide sequence to 
which the nucleic acid is complementary, under suitably stringent conditions when that 
5 sequence is present in a complex mixture (e.g., total cellular DNA or RNA). "Stringent 
conditions" are conditions under which a probe will hybridize to its target subsequence, but 
to no other sequences. Stringent conditions are sequence-dependent and are different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures than 
shorter ones. Generally, stringent conditions are selected such that the temperature is about 

10 5°C lower than the thermal melting point (Tm) for the specific sequence to which 
hybridization is intended to occur at a defined ionic strength and pH. The Tm is the 
temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% 
of the target sequence hybridizes to the complementary probe at equilibrium. Typically, 
stringent conditions include a salt concentration of at least about 0.01 to about LO M Na ion 

15 concentration (or other salts), at pH 7.0 to 8.3. The temperature is at least about 30°C for 
short probes (e.g., 1 0 to 50 nucleotides) . Stringent conditions can also be achieved with the 
addition of destabilizuig agents such as formamide. For example, conditions of 5X SSPE 
(750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30°C 
are suitable for allele-specific probe hybridization. 

20 "Complementary" or "target" nucleic acid sequences refer to those nucleic acid 

sequences which selectively hybridize to a nucleic acid probe. Proper aimealing conditions 
depend, for example, upon a probe's length, base composition, and the number of 
mismatches and then: position on the probe, and must often be determined empirically. For 
discussions of nucleic acid probe design and annealing conditions, see, for example, 

25 Sambrook et al., or Current Protocols in Molecular Biology , F. Ausubel al., ed., Greene 
Publishing and Wiley-Interscience, New York (1 987). 

A perfectly matched probe has a sequence perfectly complementary to a particular 
target sequence. The test probe is typically perfectly complementary to a portion of the target 
sequence. A "polymorphic" marker or site is the locus at which a sequence difference occurs 
30 with respect to a reference sequence. Polymorphic markers include restriction fragment 

length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, 
minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple 
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sequence repeats, and insertion elements such as Alu. The reference allelic fonn may be, for 
example, the most abundant form in a population, or the first aUelic form to be identified, and 
other alleUc forms are designated as alternative, variant or polymorphic alleles. The allelic 
form occurring most frequently in a selected population is sometimes referred to as the "wild 
type" form, and herem may also be referred to as the "reference" form. Diploid organisms 
may be homozygous or heterozygous for alleUc forms. A diallelic polymorphism has two 
distinguishable forms (i.e., base sequences), and a triallelic polymorphism has three such 
forms. 

As used herein an "oligonucleotide'' is a single-stranded nucleic acid ranging in length 
from 2 to about 60 bases. OUgonucleotides are often synthetic but can also be produced from 
naturally occurring polynucleotides. A probe is an oligonucleotide capable of binding to a 
target nucleic acid of a complemenlaiy sequence through one or more types of chemical 
bonds, usually through complementary base pairing via hydrogen bond formation. 
Oligonucleotides probes are often between 5 and 60 bases, and, in specific embodiments, 
may be between 10-40, or 15-30 bases long. An ohgonucleotide probe may include natiL 
(i.e. A, G, C, or 1) or modified bases (7-deazaguanosine, inosme, etc.). hi addition, the 
bases in an oligonucleotide probe may be joined by a linkage other than a phosphodiester 
bond, such as a phosphoramidite Imkage or a phosphorothioate linkage, or they may be 
peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than 
by phosphodiester bonds, so long as it does not interfere with hybridization. 

As used herein, the term "primer" refers to a single-stranded oligonucleotide which 
acts as a point of initiation of tempiate-directed DNA synthesis under appropriate conditions 
(e.g., in the presence of four different nucleoside triphosphates and a polymerization agent, 
such as DNA polymerase, RNA polymerase or reverse transcriptase) in an appropriate buffer 
and at a suitable temperature. The appropriate length of a primer depends on the intended 
use of the primer, but typically ranges from 15 to 30 nucleotides. Short primer molecules 
generally require cooler temperatures to form sufiSciently stable hybrid complexes with the 
template. A primer need not be perfectly complementary to the exact sequence of the 
template, but should be sufBdently complementary to hybridize with it. The term "primer 
site" refers to the sequence of the target DNA to which a primer hybridizes. The term 
"primer pair" refers to a set of primers mcluding a 5' (upstream) primer that hybridizes with 
the 5' end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes 
with the complement of the 3' end of the sequence to be amplified. 
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DNA fragments can be prepared, for example, by digesting plasmid DNA, or by use 
of PGR. Oligonucleotides for use as primers or probes are chemically synthesized by 
methods known in the field of the chemical synthesis of polynucleotides, including by way of 

5 non-limiting example the phosphoramidite method described by Beaucage and Carruthers, 
Tetrahedron Lett 22:1859-1 862 (1981) and the triester method provided by Matteucci, et ah, 
J. Am. Chem, Soc. 103:3185 (1981) both incorporated herein by reference. These 
syntheses may employ an automated synthesizer, as described in Needham- VanDevanter, 
D.R., et ah. Nucleic Acids Res . 12:61596168 (1984). Purification of oligonucleotides may 

10 be carried out by either native acrylamide gel electrophoresis or by anion-exchange HPLC as 
described in Pearson, J.D. and Regnier, F.K, ,J. Chrom,, 255:137-149 (1983). A double 
stranded fragment may then be obtained, if desired, by annealing appropriate complementary 
single strands together under suitable conditions or by synthesizing the complementary strand 
using a DNA polymerase with an appropriate primer sequence. Where a specific sequence 

15 for a nucleic acid probe is given, it is understood that the complementary strand is also 
identified and included. The complementary strand will work equally well in situations 
where the target is a double-stranded nucleic acid. 

The sequence of the synthetic oligonucleotide or of any nucleic acid fiagment can be 
can be obtained using dther the dideoxy chain termination method or tiie Maxam-Gilbert 

20 method (see Sambrook et al. Molecular Cloning - a Laboratory Manual (2nd Ed.X Vols. 1- 
3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, (1989), v/inch is 
incorporated herein by reference. This manual is hereinafter referred to as "Sambrook et al." 
; Zyskind et al., (1988)). Recombinant DNA Laboratory Manual, (Acad. Press, New York). 
Oligonucleotides usefiil in diagnostic assays are typically at least 8 consecutive nucleotides in 

25 length, and may range upwards of 1 8 nucleotides in length to greater than 100 or more 
consecutive nucleotides. 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules 
that are hybridizable to or complementary to the nucleic acid molecule comprising the SNP- 
containing nucleotide sequences of the invention, or fragments, analogs or derivatives 
30 thereof An "antisense" nucleic acid comprises a nucleotide sequence that is complementary 
to a "sense" nucleic acid encoding a protein, e.g., complementary to the coding strand of a 
double-stranded cDNA molecule or complementary to an mRNA sequence. In specific 
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aspects, antiseme nucleic acid molecules are provided that comprise a sequence 
complementary to at least about 10, about 25, about 50. or about 60 nucleotides or an entire 
SNP coding strand, or to only a portion thereof. 

In one embodiment, an antisense nucleic acid molecule is antisense to a "coding 
legion" of the coding strand of a polymorphic nucleotide sequence of the invention. The term 
"coding region" refers to the region of the nucleotide sequence comprising codons which are 
translated into amino acid. In another embodiment, the antisense nucleic acid molecule is 
antisense to a "noncoding region" of the coding strand of a nucleotide sequence of the 
invention. The term "noncoding region" refers to 5' and 3' sequences which flank the coding 
region that are not translated into amino acids (i.e., also referred to as 5' and 3' mitranslated 
regions). 

Given the coding strand sequences disclosed herein, antisense nucleic acids of the 
invention can be designed according to the rules of Watson and Crick or Hoogsteen base 
pairing. For example, the antisense nucleic acid molecule can generally be complementary to 
the entire coding region of an mRNA. but more preferably as embodied herein, it is an 
oligonucleotide that is antisense to only a portion of the coding or noncoding region of the 
mRNA. An antisense oligonucleotide can range m length between about 5 and about 60 
nucleotides, preferably between about 10 and about 45 nucleotides, more preferably between 
about 15 and 40 nucleotides, and stiU more preferably between about 15 and 30 in length. An 
antisense nucleic acid of the invention can be constructed using chemical synthesis or 
enzymatic ligation reactions using procedures known in the art For example, an antisense 
nucleic acid {e.g., an antisense oligonucleotide) can be chemicaUy synthesized using naturally 
occuning nucleotides or variously modified nucleotides designed to increase the biological 
stability of the molecules or to increase the physical stability of the duplex farmed between 
the antis«ise and sense nucleic acids, e.g., phosphorothioate derivatives and acridine 
substituted nucleotides can be used. 

Examples of modified nucleotides that can be used to generate the antisense nucleic 
acid include: 5-fluoiouracil. 5-bromomacil, 5-chloroumcil, 5-iodouracil, hypoxanthine. 
xanthine, 4-acetylcylosine, 5-(carboxyhydroxybnethyl) uracil, 5-carboxymethylaminomethyl- 
2-thiouridine, 5-carboxymethylaminomethyIuracil, dihydrouracU, beta-D-galactosylqueosine, 
inosine, N6-isopentenyladenine. 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine. 
2-methyladenine. 2-methylguanine. 3-methylcytosine. 5-methylcytosme. N6-adenine. 

25 
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7-methylguamne, S-methylaminomethyluracil, 5-methoxyaminomethyl-2-thioiiracil, 
beta-D-mannosylqueosine, 5*-methoxycarboxymethyluracil, S-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil-S-oxyacetic acid (v), wybutoxosine, 
pseudouiacil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 

5 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 
5-inethyl-2-thiouracil, 3-(3-aniino-3-N-2-carboxypropyl) uracil, (acp3)w, and 
2,6-diaminopurine. Alternatively, the antisense nucleic acid can be produced biologically 
using an expression vector into which a nucleic acid has been subcloned in an antisense 
orientation (/.e., RNA transcribed from the inserted nucleic acid will be of an antisense 

10 orientation to a target nucleic acid of interest, described further in the following section). 

The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding a poljonorphic protein to thereby inhibit expression of the protein, 

by inhibiting transcription and/or translation. The hybridization can be by conventional 
nucleotide complementary to fonn a stable duplex, or, for example, in the case of an 
antisense nucleic acid molecule that binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a route of administration of antisense 
nucleic acid molecules of the invention includes direct injection at a tissue site. 
Alternatively, antisense nucleic acid molecules can be modified to target selected cells and 
then administered systemically. For example, for systemic administration, antisense 
molecules can be modified such that they specifically bind to receptors or antigens expressed 
on a selected cell surface, e.g., by linking the antisense nucleic acid molecules to peptides or 
antibodies that bind to cell surface receptors or antigens. The antisense nucleic acid 
molecules can also be delivered to cells using the vectors described herein. To achieve 
sufficient intracellular concentrations of antisense molecules, vector constructs in which the 
antisense nucleic acid molecule is placed under the control of a strong pol II or pol in 
promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
a-anomeric nucleic acid molecule. An a-anomeric nucleic acid molecule forms specific 
30 double-stranded hybrids with complementary RNA in which, contrary to the usual -units, the 
strands run parallel to each other (Gaultier et al (1987) Nucleic Acids Res 15: 6625-6641). 
The antisense nucleic acid molecule can also comprise a 2'-o-methylribonucleotide (Inoue et 
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al. {mi) Nucleic Adds Res 15: 613 1-6148) or a chimeric RNA -DNA analogue (Inoue et al. 
(1987) FEBSLettllS: 327-330). 

* 

The following tenns are used to describe the sequence relationships between two or 
more nucleic acids or polynucleotides: "reference sequence", "comparison window", 
sequence identity", "percentage of sequence identity", and "substantial identity". A 
reference sequence" is a defined sequence used as a basis for a sequence comparison; a 
reference sequence may be a subset of a larger sequence, for example, as a segment of a full- 
length cDNA or gene sequence given in a sequence listmg, or may comprise a complete 
cDNA or gene sequence. Optimal alignment of sequences for aligning a comparison window 
may, for example, be conducted by the local homology algorithm of Smith and Waterman 
Adv. Appl. Math 2482 (1981), by the homology alignment algorithm of Needleman and 
Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and 
^^P™^ Proc. Natl. Acad. Sci. U.S.A. 852444 (1988), or by comput^ized 
implementations of these algorithms (for example, GAP, BESTFIT, FASTA, and IFASTA in 
the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 
Science Dr., Madison, WI). 

Techniques for nucleic acid manipulation of the nucleic acid sequences harboring the 
cSNP's of the invention, such as subcloning nucleic acid sequences encoding polypeptides 
into expression vectors, labeling probes, DNA hybridization, and the like, are described 
generally in Sambrook et al.. The phrase "nucleic acid sequence encoding" refers to a nucleic 
acid which directs the expression of a specific protein, peptide or amino acid sequence. The 
nucleic add sequences include both the DNA strand sequence that is transcribed into RNA 
and the RNA sequence that is translated into protein, peptide or amino acid sequence. TTie 
nucleic add sequences include both the foil length nucldc acid sequences disclosed herein as 
weU as non-fiiU length sequences derived from the Ml length protem. It being further 
nndastood that the sequence includes the degenerate codons of the native sequence or 
sequences which may be introduced to provide codon preference in a specific host cell. 
ConsequQitly, the principles of probe selection and array design can readily be extended to 
analyze more complex polymorphisms (see EP 730,663). For example, to charactmze a 
trialldic SNP polymorphism, three groups of probes can be designed tUed on the three 
polymorphic forms as described above. As a fiirther example, to analyze a dialleHc 
polymorphism involving a deletion of a nucleotide, one can tile a first group of probes based 
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on the undeleted polymorphic form as the reference sequence and a second group of probes 
based on the deleted form as the reference sequence. 

For assays of genomic DNA, virtually any biological convenient tissue sample can be 
used. Suitable samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, 

5 buccal, skin and hair. Genomic DNA is typically amplified before analysis. Amplification is 
usually effected by PGR using primers flanking a suitable fiagment e.g., of 50-500 
nucleotides containing the locus of the polymorphism to be analyzed. Target is usually 
labeled in the course of amplification. The amplification product can be RNA or DNA, 
single stranded or double stranded. If double stranded, the amplification product is typically 

10 denatured before application to an array. If genomic DNA is analyzed without amplification, 
it may be desirable to remove RNA firom the sample before applying it to the array. Such can 
be accomplished by digestion with DNase-firee RNase. 

Detection of Polymorphisms in a Nucleic Acid Sample 

The SNPs disclosed herein can be used to determine which forms of a characterized 
15 polymorphism are present in individuals under analysis. 

The design and use of allele-specific probes for analyzing polymorphisms is described 
bye.g.,Saiki et al.. Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 
89/1 1548. Allele-specific probes can be designed that hybridize to a segment of target DNA 
from one individual but do not hybridize to the corresponding segment from another 

20 individual due to the presence of different polymorphic forms in the respective segments 
from the two individuals. Hybridization conditions should be sufSciently stringent fliat there 
is a significant difference in hybridization intensity between alleles, and preferably an 
essentially binary response, whereby a probe hybridizes to only one of the alleles. Some 
probes are designed to hybridize to a segment of target DNA such that the polymorphic site 

25 aligns with a central position (e.g., in a 15-mer atthe7position; ina 16-mer, at either the 7, 
8 or 9 position) of the probe. This design of probe achieves good discrimination in 
hybridization between different allelic forms. 

Allele-specific probes are often used in pairs, one member of a pair showing a perfect 
match to a reference form of a target sequence and the other member showing a perfect match 
30 to a variant form. Several pairs of probes can then be immobilized on the same support for 
simultaneous analysis of multiple polymorphisms within the same target sequence. 
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The polymoiphisms can also be identified by hybridization to nucleic add airays. 
some examples of which are described in pubhshed PCT application WO 95/1 1 995. WO 
95/1 1 995 also describes subarrays that are optimized for detection of a variant foim of a 
precBaiacterized polymorphism. Such a subanay contains probes designed to be 
complementary to a second reference sequence, which is an allelic variant of the first 
reference sequence. The second group of probes is designed by die same principles, except 
that the probes exhibit complementarity to the second reference sequence. The inclusion of 
a second group (or fiirther groups) can be particularly usefiil for analyzing short 
subsequences of the primary reference sequence in which multiple mutations are expected to 
occur within a short distance commensurate with the length of the probes (e.g., two or more 
mutations within 9 to 21 bases). 

An allele-specific primer hybridizes to a site on a target DNA overlapping a 
polymorphism and only primes amplification of an allelic form to which the primer exhibits 
perfect complementarity. See Gibbs, Nucleic Acid Res. 172427-2448(1989). This 
primer is used in conjunction with a second primer which hybridizes at a distal site. 
Amplification proceeds fiom the two-primers, resulting in a detectable product which 
indicates the particular aUelic form is present A control is usually performed with a second 
pair of primers, one of which shows a single base mismatch at the polymorphic site and the 
other of which exhibits perfect complementarity to a distal site. The single-base mismatch 
prevents amplification and no detectable product is formed. ITie method worlcs best when 
the mismatch is included in the 3'-most position of the oligonucleotide aligned with the 
polymorphism because this position is most destabilizing to elongation fiom the primer (see, 
e.g., WO 93/22456). 

Amplification products generated using the polymerase chain reaction can be 
analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be 
identified based on the different sequence-dependent melting properties and electrophoretic 
migration of DNA in solution. Erlich, ed., PGR Technology, Principles and AppUcations 
for DNA Amplification, (W.H. Freeman and Co New York, 1 992, Chapter 7). 

Alleles of target sequences can be differentiated using single-strand conformation 
polymorphism analysis, which identifies base differences by alteration in electrophoretic 
migration of single stranded PCR products, as described in Orita etal., Proc. Nat. Acad. 
Sci. 86,2766-2770(1989). Amphfied PCR products can be generated and heated or 
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otherwise denatured, to form single stranded amplification products. Single-stranded 
nucleic acids may refold or form secondary structures which are partially dependent on the 
base sequence. The different electrophoretic mobilities of single-stranded amplification 
products can be related to base-sequence differences between alleles of target sequences. 

5 The genot3rpe of an individual with respect to a pathology suspected of being caused 

by a genetic polyinorphism may be assessed by association analysis. Phenotypic traits 
suitable for association analysis include diseases that have known but hitherto unmapped 
genetic components (e.g., agammaglobulinemia, diabetes insipidus, Lesch-Nyhan syndrome, 
muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease, familial 
10 hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand's 
disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, 
Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria). 

Phenotypic traits also include symptoms of, or susceptibility to, multifactoried 
diseases of which a component is or may be genetic, such as autoimmime diseases, 

15 inflammation, cancer, diseases of the nervous system, and infection by pathogenic 

microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, 
multiple sclerosis, diabetes (insulin-dependent and non- independent), systemic lupus 
erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, 
bram, breast, colon, esophagus, kidney, oral cavity, ovary, pancreas, prostate, skin, stomach, 

20 , leukemia, liver, lung, and uterus. Phenotypic traits also include characteristics such as 
longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and 
susceptibility or receptivity to particular drugs or therapeutic treatments. 

Determination of which polymorphic forms occupy a set of polymorphic sites in an 
individual identifies a set of polymorphic forms that distinguishes the individual. See 

25 generally National Research Council, The Evaluation of Forensic DNA Evidence (Eds. 

Pollard et al.. National Academy Press, DC, 1996). Since the polymorphic sites are within a 
50,000 bp region in the human genome, the probability of recombination between these 
polymorphic sites is low. That low probability means the haplotype (the set of all 10 
polymorphic sites) set forth in this appUcation should be inherited without change for at least 

30 several generations. The more sites that are analyzed the lower the probability that the set of 
polymorphic forms in one individual is the same as that in an unrelated individual. 
Preferably, if multiple sites are analyzed, the sites are unlinked. Thus, polymorphisms of the 
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invention are often used in conjunction with polymorphisms in distal gaies. Preferred 
polymoiphisms for use in forensics are diallelic because the population fiequencies of two 
polymorphic forms can usually be determined with greater accuracy than ttose of multiple 
polymorphic forms at multi-allelic loci. 

The capacity to identify a distinguishing or unique set of forensic maricers in an 
individual is useful for forensic analysis. For example, one can determine whether a blood 
sample from a suspect matches a blood or other tissue sample from a crime scene by 
determining \\*ether the set of polymorphic forms occupying selected polymorphic sites is 
the same in the suspect and the sample. If the set of polymorphic markers does not match 
betweai a suspect and a sample, it can be concluded (barring experimental error) that the 
suspect was not the source of the sample. If the set of markers does match, one can conclude 
that the DNA from the suspect is consistent with that found at the crime scene. If frequencies 
of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a 
suitable population of individuals), one can perform a statistical analysis to determine the 
probability that a match of suspect and crime scene sample would occur by chance. 

p(ID) is the probability that two random individuals have the same polymorphic or 
alleUc form at a given polymorphic site. In diaUelic loci, four genotypes are possible: AA, 
AB, BA, and BE. If alleles A and B occur in a haploid genome of the organism with 
frequencies x and y, the probability of each genotype in a diploid organism are (see WO 
95/12607): 

Homozygote: p(AA)=x2 

Homozygote: p(BB)=y2=(l-x)2 

Single Heterozygote: p(AB)=p(BA)=xy=x(l-x) 

Both Heterozygotes: p(AB+ BA)=2xy=2x(l-x) 

The probabiUty of identity at one locus (i.e, the probabiUty that two individuals, picked at 
random from a population will have identical polymorphic forms at a given locus) is given by 
the equation: 



p(/i>)=(*2)2+ (2x,;)2+ (y2)2. 
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These calculations can be extended for any number of polymorphic forms at a given 
locus. For example, the probability of identity p(ID) for a 3-allele system where the alleles 
have the frequencies in the population of x, y and z, respectively, is equal to the sum of the 
squares of the genotype frequencies: 

5 /7(/i))=x4+ (2xy)2+ (2j/2)2+ (2xz)2+ z4+ y4 

In a locus of n alleles, the appropriate binomial expansion is used to calculate p(ID) and 
p(exc). 

The cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci 
is determined by multiplying the probabilities provided by each locus: 

10 cump{ID)=piIDl)piID2)p(ID3) . . .p(IDn) 

The cumulative probability of non-identity for n loci (i.e. the probability that two random 
individuals will be different at 1 or more loci) is given by the equation: 

cum pinonlDy^l-cum p{ID\ 

If several polymorphic loci are tested, the cumulative probability of non-identity for random 
15 individuals becomes very high (e.g., one billion to one). Such probabilities can be taken into 
account together with other evidence in determining the guilt or innoc^ce of the suspect. 

The object of paternity testing is usually to determine whether a male is the father of a 
child. In most cases, the mother of the child is known and thus, the mother's contribution to 
the child's genotj^e can be traced. Paternity testing investigates whether the part of the child's 
20 genotype not attributable to the mother is consistent with that of the putative father. Paternity 
testing can be performed by analyzing sets of polymorphisms in the putative father and the 
child. 

If the set of polymorphisms in the child attributable to the father does not match the 
putative father, it can be concluded, barring experimental error, that the putative father is not 
25 the real father. If the set of polymorphisms in the child attributable to the father does match 
the set of polymorphisms of the putative father, a statistical calculation can be performed to 
determine the probability of coincidental match. 
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The probability of parentage exclusion (representing the probability that a random 
male will have a polymorphic form at a given polymorphic site that makes him incompatible 
as the father) is given by the equation (see WO 95/12607): 

piexcy^Uxy) 

where x and y are the population fiequencies of alleles A and B of a diallelic polymorphic 
site. (At atriallelic site p(exc)=xy(l-xy)+ yz(l-yz)+ xz(l-xz>f 3xyz(l-xyz))), where x, y and 
z and the respective population fiequencies of aUeles A, B and C). The probability of non- 
exclusion is: 

p{non-exc)=\-p{exc) 

The cumulative probability of non-exclusion (representing the value obtained when n loci are 
used) is thus: 

cump{non-^xcyp{mn-exc\)p{mn-exc2)p{rion-excy) . . . p{non-excn) 

The cumulative probabiUty of exclusion for n loci (representing the probability that a random 
male will be excluded) is: 

cum p(excy=l-cum p(non-exc). 

If several polymorphic loci are included in the analysis, the cumulative probability of 
exclusion of a random male is very high. This probabUity can be taken into account in 
assessing the liability of a putative father whose polymorphic marker set matches the child's 
polymorphic marker set attributable to his/her father. 

The polymorphisms of the invention may contribute to the phenotype of an organism 
in differait ways. Some polymorphisms occur within a protein coding sequence and 
contribute to phenotype by aflfectmg protein structure. The effect may be neutral, beneficial 
or detrimental, or both beneficial and detrimental, depending on the circumstances. For 
example, a heterozygous sickle .cell mutation confers resistance to malaria, but a homozygous 
sickle ceU mutation is usually lethal. Other polymorphisms occur in noncoding regions but 
may exert phenotypic effects indirectly via influence on repUcation, transcription, and 
translatioa A smgle polymorphism may affect more than one phenotypic trait. Likewise, a 
single phenotypic trait may be affected by polymorphisms in different genes. Further, some 
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polymorphisms predispose an individual to a distinct mutation that is causally related to a 
certain phenotype. 

Phenotypic traits include diseases that have known but hitherto unmapped genetic 
components. Phenotypic traits also include symptoms of, or susceptibility to, multifactorial 

5 diseases of which a component is or may be genetic, such as autoimmune diseases, 
inflammation, cancer, diseases of the nervous system, and infection by pathogenic 
microorganisms. Some examples of autoimmime diseases include rheumatoid arthritis, 
multiple sclerosis, diabetes (insulin-dependent and non-independent), systemic lupus 
erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, 

10 brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, 
prostate, skin, stomach and uterus. Phenotypic traits also include characteristics such as 
longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and 
susceptibility or receptivity to particular drugs or therapeutic treatments. 

Correlation is performed for a population of individuals who have been tested for the 
15 presence or absence of a phenotypic trait of interest and for polymorphic marker sets. To 
perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polymorphic 
set) is determined for a set of the individuals, some of whom exhibit a particular trait, and 
some of whom exhibit lack of the trait The alleles of each polymorphism of the set are then 
reviewed to determine whether the presence or absence of a particular allele is associated 
20 with the trait of interest Correlation can be performed by standard statistical methods and 
statistically significant correlations between polymorphic fonn(s) and phenotypic 
characteristics are noted. For example, it might be foimd that the presence of allele Al at 
polymorphism A correlates with heart disease. As a further example, it might be found that 
the combined presence of allele Al at polymorphism A and allele Bl at polymorphism B 
25 correlates with increased milk production of a farm animal. 

Such correlations can be exploited in several ways. In the case of a strong correlation 
between a set of one or more polymorphic forms and a disease for which treatment is 
available, detection of the polymorphic form set in a human or animal patient may justify 
immediate administration of treatment, or at least the institution of regular monitoring of the 
30 patient. Detection of a polymorphic form correlated with serious disease in a couple 

contemplating a family may also be valuable to the couple in their reproductive decisions. 
For example, the female partner might elect to undergo in vitro fertilization to avoid the 
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possibility Of transmitting such a polymorphism from her husband to her offepring. In the 
case of a weaker, but still statisticaUy significant correlation between a polymorphic set and 
human disease, immediate therapeutic intervention or monitoring may not be justified. 
Nevertheless, the patient can be motivated to begin simple life-style changes (e.g., diet, 
exercise) that can be accomplished at little cost to the patient but confer potential benefits in 
reducing the risk of conditions to vdiich the patient may have increased susceptibility by 
virtue of variant alleles. Identification of a polymorphic set in a patient correlated with 
enhanced receptiveness to one of several treatment regimes for a disease indicates that this 
treatment regime should be followed. 

For animals and plants, correlations between characteristics and phenotype are usefal 
for breeding for desired characteristics. For example, Beitz et al., U.S. Pat No. 5,292,639 
discuss use of bovine mitochondrial polymorphisms m a breeding program to improve milk 
production in cows. To evaluate the effect of mtDNA D-loop sequence polymorphism on 
milk production, each cow was assigned a value of 1 if variant or 0 if wild type with respect 
to a prototypical mitochondrial DNA sequence at each of 1 7 locations considered. 

The previous section concerns identifying correlations between phenotypic traits and 
polymorphisms that directly or indirectly contribute to those traits. The present section 
describes identification of a physical linkage between a genetic locus associated with a trait 
of interest and polymorphic markers that are not associated with the trait, but are in physical 
. proximity with the genetic locus responsible for the trait and co-segregate with it Such 
analysis is usefiil for mapping a genetic locus associated with a phenotypic trait to a 
chromosomal position, and thereby cloning gene(s) responsible for the trait See Lander et al 
Proc. Natl. Acad. Set (USA) 83, 7353-7357 (1986); Lander et al., Proc Natl. Acad Sci. 
(USA) 84, 2363-2367 (1987); Donis-KeUer et al.. Cell 51. 319-337 (1987); Lander et al.. 
Genetics 121, 1 85-1 99 (1 989)). Genes localized by linkage can be cloned by a process kiiown 
as dhrectional cloning. See Wainwright Med J. Australia 159, 170-174 (1993); Collins, 
Nature Genetics 1, 3-6 (1 992) (each of which is incorporated by reference in it^ entirety' for 
all purposes). 

Linkage studies are typically pafoimed on members of a family. AvaUable members 
of the family are characterized for die presence or absence of a phenotypic trait and for a set 
of polymorphic markers. The distribution of polymorphic markers in an mformative meiosis 
is then analyzed to determine which polymorphic markers co-segregate with a phenotypi 
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trait See, e.g., Kerem et al.. Science 245, 1073-1080 (1989); Monaco et al.. Nature 316, 842 
(1985); Yamoka et al.. Neurology 40, 222-226 (1990); Rossiter et al., FASEB Journal 5, 21- 
27 (1991). 

Linkage is analyzed by calculation of LOD (log of the odds) values. A lod value is the 
5 relative likelihood of obtaining observed segregation data for a marker and a genetic locus 
when the two are located at a recombination fraction RF, versus the situation in which the 
two are not linked, and thus segregating independently (Thompson & Thompson, Genetics in 
Medicine (5th ed, W.B. Saunders Company, Philadelphia, 1991); Strachan, "Mapping the 
human genome" in The Human Genome (BIOS Scientific Publishers Ltd, Oxford), Chapter 
10 4). A series of likelihood ratios are calculated at various recombination fractions {RF)^ 

ranging from RF=OSi (coincident loci) to JiF=0.50 (unlinked). Thus, the likelihood at a given 
value ofRFis: probability of data if loci linked at RFXo probability of data if loci unlinked. 
The computed likelihood is usually expressed as the log^g of this ratio (i.e., a lod score). For 

example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a 
15 coincidence. The use of logarithms allows data collected from different families to be 
combined by simple addition. Computer programs are available for the calculation of lod 
scores for differing values of RF (e.g., LIPED, MLINK (Lathrop, Proc, Nat, Acad. Sci, 
(USA) 81, 3443-3446 (1984)). For any particular lod score, a recombination fraction may be 
determined from mathematical tables. See Smith et al.. Mathematical tables for research 
20 workers in human genetics (Churchill, London, 1961); Smith, ^mz. Hum. Genet. 32, 127-150 
(1968). The value of RF at which the lod score is the highest is considered to be the best 
estimate of the recombination fraction. 

Positive lod score values suggest that the two loci are linked, whereas negative values 
suggest that linkage is less likely (at that value of RF) than the possibility that the two loci are 

25 unlinked. By convention, a combined lod score of + 3 or greater (equivalent to greater than 
1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. 
Similarly, by convention, a negative lod score of -2 or less is taken as definitive evidence 
against linkage of the two loci being compared. Negative linkage data are useful in excluding 
a chromosome or a segment thereof from consideration. The search focuses on the remaining 

30 non-excluded chromosomal locations. 

The invention further pro\ides transgenic nonhuman animals capable of expressing an 

exogenous variant gene and/or having one or both alleles of an endogenous variant gene 
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inactivated. Expression of an exogenous variant gene is usually achieved by operably 
linking the gene to a promoter and optionaUy an enhancer, and microinjecting the construct 
into a zygote. SeeHoganet al.. "Manipulating the Mouse Embryo, A Laboratory Manual," 
Cold Spring Harbor Laboratory. (1989). Inactivation ofendogenous variant genes can be ' 
achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of 
a positive selection marker. SeeCapecchi, Science 244. 1288-1292 TTie transgene is then 
introduced into an embryonic stem cell, where it undergoes homologous recombination with 
an endogenous variant gene. Mice and other rodents are preferred animals. Such animals 
provide usefiil drug screening systems. 

nie invention further provides methods for assessing the pharmacogenomic 
susceptibility of a subject harboring a single nucleotide polymorphism to a particular 
pharmaceutical coinpound. or to a class of such compounds. Genetic polymorphism in drug- 
metabolizing enzymes, drug transporters, receptors for pharmaceutical agents, and other drug 
targets have been correlated with individual differences based on distinction in the efficacy 
and toxicity of the pharmaceutical agent administered to a subject Pharmocogenomic 
characterization of a subjects susceptibility to a drug enhances the ability to tailor a dosing 
regimen to the particular genetic constitution of the subject, thereby enhancing and 
optimizing the therapeutic effectiveness of the therapy. 

In cases in which a cSNP leads to a polymorphic protein that is ascribed to be the 
cause of a pathological condition, method of treating such a condition includes administering 
to a subject experiencing the pathology the wild type cognate of the polymorphic protein. 
Once administered in an effective dosing regimen, the wild type cognate provides 
complementation or remediation of the defect due to the polymorphic protein. The subject's 
condition is ameliorated by this protein therapy. 

A subject suspected of suffering fiom a pathology ascribable to a polymorphic protein 
that arises from a cSNP is to be diagnosed using any of a variety of diagnostic methods ■ 
capable of identifying the presence of the cSNP in the nucleic add, or of the cognate 
polymoiphic protein, in a suitable clinical sample taken from the subject Once the presence 
of the cSNP has been ascertained, and the pathology is correctable by administering a normal 
or wild-type gene, the subject is treated with a pharmaceutical composition that includes a 
nucleic acid that harbors the correcting wild-type gene, or a fragment containing a correcting 
sequence of the wild-type gene. Non-limiting examples of ways in which such a nucleic acid 
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may be administered include incorporating the wild-type gene in a viral vector, such as an 
adenovirus or adeno associated virus, and administration of a naked DNA in a pharmaceutical 
composition that promotes intracellular uptake of the administered nucleic acid. Once the 
nucleic acid that includes the gene coding for the wild-type allele of the polymorphism is 
5 incorporated within a cell of the subject, it will initiate de novo biosynthesis of the wild-type 
gene product. If the nucleic acid is further incorporated into the genome of the subject, the 
treatment will have long-term effects, providing de novo synthesis of the wild-type protein 
for a prolonged duration. The synthesis of the wild-type protein in the cells of the subject 
will contribute to a therapeutic enhancement of the clinical condition of the subject 

10 A subject suffering from a pathology ascribed to a SNP may be treated so as to correct 

the genetic defect. (See Kren et al., Proc. NaU. Acad. Sci. USA 96:10349-10354 (1999)). 
Such a subject is identified by any method that can detect the polymorphism in a sample 
drawn from the subject. Such a genetic defect may be permanently corrected by 
administering to such a subject a nucleic acid fragment incorporating a repair sequence that 

15 supplies the wild-type nucleotide at the position of the SNP. This site-specific repair 

sequence encompasses an RNA/DNA oligonucleotide which operates to promote endogenous 
repair of a subject's genomic DNA. Upon administration in an appropriate vehicle, such as a 
complex with polyethylenimine or encapsulated in anionic liposomes, a genetic defect 
leading to an inborn pathology may be overcome, as the chimeric oligonucleotides mduces 

20 incorporation of the wild-type sequence into the subject's genome. Upon incorporation, the 
wild-type gene product is e>q)ressed, and the replacement is propagated, thereby engendering 
a permanent repair. 

The invention further provides kits comprising at least one allele-specific 
oligonucleotide as described above. Often, the kits contain one or more pairs of allele- 

25 specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, 
the allele-specific oligonucleotides are provided immobilized to a substrate. For example, 
the same substrate can comprise allele-specific oligonucleotide probes for detecting at least 
10, 100, 1000 or all of the polymorphisms shown in the Table. Optional additional 
components of the kit include, for example, restriction enzymes, reverse-transcriptase or 

30 polymerase, the substrate nucleoside triphosphates, means used to label (for example, an 

avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin), and the 

appropriate buffers for reverse transcription, PGR, or hybridization reactions. Usually, the 

kit also contains instructions for carrying out the hybridizing methods. 
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Several aspects of the present invention rely on having available the polymorphic 
proteins encoded by the nucleic adds comprising a SNP of the inventions. There are various 
methods of isolating these nucleic acid sequences. For example, DNA is isolated from a 
genomic or cDNA library using labeled oligonucleotide probes havmg sequences 
complementary to the sequences disclosed herein. 

Such probes can be used directly in hybridization assays. Alternatively probes can be 
designed for use in amplification techniques such as PCR. 

To prepare a cDNA Ubrary, mRNA is isolated &om tissue such as heart or pancreas, 
preferably a tissue wherein expression of the gene or gene family is likely to occur. cDNA is 
prepared from the mRNA and ligated into a recombinant vector. The vector is transfected 
into a recombinant host for propagation, screening and cloning. Methods for making and 
screening cDNA libraries are weU known. See Gubler, U. and Hoffinan, B J. Gene 25:263- 
269 (1983) and Sambrook et al. 

For a genomic library, for example, the DNA is extracted from tissue and either 
mechanicaUy sheared or enzymatically digested to yield fragments of about 12-20 kb. The 
fragments are then separated by gradient centrifiigation from undesired sizes and are 
constructed in bacteriophage lambda vectors. Ihese vectors and phage are packaged in vitro, 
as described m Sambrook. et al. Recombinant phage are analyzed by plaque hybridization a^ 
described in Benton and Davis. Sciaice 196:180-1 82 (1977). Colony hybridization is carried 
out as generally described in M. Grunstem et al. Proc. NaU. Acad. Sci. USA. 72:3961- 
3965 (1 975). DNA of interest is identified in either cDNA or genomic Ubraries by its ability 
to hybridize v«th nucleic acid probes, for example on Southern blots, and these DNA regions 
are isolated by standard methods familiar to those of skiU m the art. See Sambrook, et aL 

In PCR techniques, oligonucleotide primers complementary to the two 3' borders of 
the DNA region to be amplified are synthesized. The polymerase chain reaction is then 
carried out using the two primers. See PCR Protocols: a Guide to Methods and AppUcations 
Onnis, M, Gelfand, D., Sninsky, J. and White, T., eds.), Academic Press, San Diego (1990). 
Primers can be selected to amplify the entire regions encoding a fiill-length sequence of 
interest or to amplify smaller DNA segments as desired. PCR can be used in a variety of 
protocols to isolate cDNAs encoding a sequence of interest. In these protocols, appopriate 
primers and probes for ampUfying DNA encoding a sequence of interest are genoated from 
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analysis of the DNA sequences listed herein. Once such regions are PCR-amplified, they can 
be sequenced and oUgonucleotide probes can be prepared fiom the sequence. 

Once DNA encoding a sequence comprising a cSNP is isolated and cloned, one can 
express the encoded polymorphic proteins in a variety of recombinantly engineered ceUs. It 
5 is expected that those of skill in the art are knowledgeable in the numerous expression 
systems available for expression of DNA encoding a sequence of interest. No attempt to 
describe m detail the various methods known for the expression of proteins in prokaryotes or 
eukaryotes is made here. 

In brief summary, the expression of natural or synthetic nucleic acids encoding a 
10 sequence of interest will typically be achieved by operably linking the DNA or cDNA to a 
promoter (which is either constitutive or inducible), foUowed by incorporation into an 
expression vector. The vectors can be suitable for repUcation and integration in either 
prokaryotes or eukaryotes. Typical expression vectors contain mitiation sequences, 
transcription and translation terminators, and promoters usefiil for regulation of the 
15 expressionofapolynucleotidesequenceofinlerest. To obtain high level expression of a 

cloned gene, it is desirable to construct expression plasmids which contain, at the minimum, a 
strong promoter to direct transcription, a ribosome binding site for translational initiation, and 
a transcription/translation terminator. The expression vectors may also comprise generic 
expression cassettes containing at least one independent terminator sequence, sequences 
20 permitting repUcation of the plasmid in both eukaryotes and prokaryotes, i.e.. shuttle vectors, 
and selection markers for both prokaryotic and eukaryotic systems. See Sambrook et al. 

A variety of prokaryotic expression systems may be used to express the polymorphic 
proteins of the invention. Examples include E. coli. Bacillus, Streptomyces, and the like. 

It is preferred to construct expression plasmids which contain, at the minimum, a 
25 strong promoter to direct transcription, a ribosome binding site for translational initiation, and 

a transcription/translation terminator. Examples of regulatory regions suitable for this 
purpose in E. coli are the promoter and operator region of the E. coli tryptophan biosynthetic 
pathway as described by Yanofsky, C, J. Bacterial. 158:1018-1024 (1984) and the leftward 
promoter ofphage lambda as described by A, I. and Hagen, D., Ann. Rev. Genet 14:399- 
30 445 (1 980). The inclusion of selection markers in DNA vectors transformed in E. coli is also 
useful. Examples of such markers include genes specifying resistance to ampiciUin, 
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tetracycline, or chloramphenicol. See Sambrook et al. for details concerning selection 
markers for use in E, colL 

To enhance proper folding of the expressed recombinant protein, during purification 
fit)m E. coli, the expressed protein may. fu^ be denatured and then renatured. This can be 
accomplished by solubiUzing the bacteriaUy produced proteins in a chaolropic agait such as 
guanidine HCI and reducing all the cysteine residues with a reducing agent such as beta- 
mercaptoethanol. The protein is then renatured, either by slow dialysis or by gel filtration. 
SeeU.S. PatentNo. 4,511,503. Detection of the expressed antigen is achieved by methods 
known in the art as radioimmunoassay, or Western blottmg techniques or 

immunoprecipitation. Purification from co// can be achieved following procedures such 
as those described in U.S. PatentNo. 4,511,503. 

Any of a variety of eukaryotic ejq»ression systems such as yeast, insect ceU Unes, bird, 
fish, and mammahan cells, may also be used to express a polymorphic protein of the 
invention. As explained briefly below, a nucleotide sequence harboring a cSNP may be 
expressed in these eukaryotic systems. Syndiesis of heterologous proteins in yeast is well 
known. Methods in Yeast nenetirs Shemian. F., et al., Cold Spring Harbor Laboratory, 
(1982) is a well recogni2ed work describing the various methods available to produce the 
protein in yeast. Suitable vectors usually have expression control sequences, such as 
promoters, mcluding 3-phosphog1ycerate kinase or other glycolytic enzymes, and an origin of 
replication, termmatibn sequences and the like as desired. For instance, suitable vectors are 
described in the literature (Botstein, et al.. Gene 8:17-24 (1979); Broach, et al.. Gene 8:121- 
133 (1979)). 

Two procedures are used in transfonning yeast cells. In one case, yeast cells are first 
converted mto protoplasts using zymolyase, lyticase or glusulase, followed by addition of 
DNA and polyethylene glycol (PEG). The PEG-treated protoplasts are then regenerated in a 
3% agar medium under selective conditions. Details of this procedure are given in the papers 
by J.D. Beggs, Nature (London) 275:104-109 (1978); and Himjen, A., et al., Proc. Natl. 
Acad. Sci. USA, 75:1929-1933 (1978). The second procedure does not involve removal of 
the cell wall, histead the cells are treated with lithium chloride or acetate and PEG and put 
on selective plates (Ito,H.,etal.. J. Bact, 153163-168 (1983)) cells and applying standard 
protein isolation techniques to the lysates:. 
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The purification process can be monitored by using Western blot techniques or 
radioimmunoassay or other standard techniques. The sequences encoding the proteins of the 
invention can also be ligated to various immunoassay expression vectors for use in 
transfoming cell cultures of, for instance, mammaUan, insect, bird or fish origin. Illustrative 
of cell cultures usefiil for the production of the polypeptides are mammalian cells. 
Mammalian cell systems often wUl be in the form of monolayers of ceUs although 
mammaUan cell suspensions may also be used. A number of suitable host cell lines capable 
of expressing intact protems have been developed in the art, and include the HEK293, 
BHK21, and CHO cell Unes, and various human cells such as COS cell lines, HeLa cells, 
myeloma cell lines, Jurkat cells, etc. Expression vectors for these ceUs can include 
expression control sequences, such as an origin of replication, a promoter (e.g., the CMV 
promoter, a HSV tk promoter or pgk (phosphoglycerate kinase) promoter), an enhancer 
(Queen et al. Tmmunol. Rev. 89:49 (1986)) and necessary processing information sites, such 
as ribosome binding sites, RNA splice sites, polyadenylation sites (e.g., an SV40 large T Ag 
15 poly A addition site), and transcriptional terminator sequences. 

Other animal ceUs are available, for instance, firom the American Type Culture 
Collection Catalogue of Cell Lines and Hybridomas (7th edition, (1992)). Appropriate 
vectors for expressing the proteins of the invention in insect ceUs are usually derived firom 
baculovirus. Insect ceU lines include mosquito larvae, silkworm, armyworm, moth and 
20 Drosophila cell lines such as a Schneider cell line (See Schneider J. Embryol. Exp. Moiphol., 
27:353-365 (1 987). As indicated above, the vector, e.g., a plasmid, which is used to 
transform the host cell, preferably contains DNA sequences to initiate transcription and 
sequences to control the translation of tiie protein. Tbese sequences are referred to as 
expression control sequences. As witii yeast, when higher animal host ceUs are employed, 
25 polyadenylation or transcription terminator sequences firom known mammalian genes need to 
be incorporated into tiie vector. An example of a terminator sequence is the polyadenylation 
sequence from tiie bovine growtii hormone gene. Sequences for accurate splicing of the 
transcript may also be included. An example of a splicing sequence is tiie VPl intron fi-om 
SV40 (Sprague. J. eta/., J. Virol. 45:773-781 (1983)). Additionally, gene sequences to 
30 control replication in tiie host cell may be Saveria-Campo, M.. 1985, "Bovine PapiUoma 
virus DNA a Eukaryotic Cloning Vector" in DNA Cloning Vol H a Practical Approach Ed. 
D.M. Glover, IRL Press, Arlington, Virginia pp. 213-238. The host cells are competent or 
rendered competent for transformation by various means. There are several well-known 
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methods Of introducing DNA into animal ceUs. Hiese include: calcium phosphate 
precipitation, fusion of the recipient cells with bacterial protoplasts containing the DNA, 
treatment of the recipient cells with liposomes containing the DNA, DEAE dextian, 
electroporation and micro-injection of the DNA directly mto the cells. 

The transformed cells are cultured by means well known in the art (Biochemical 
Methods in CeU Culture and Virology, Kuchler, R.J., Dowden, Hutchinson and Ross. Inc., 
(1977)). The expressed polypeptides are isolated fiom cells grown as suspensions or as 
monolayers. The latter are recovered by well known mechanical, chemical or enzymatic 
means. 

General methods of expressing recombinant proteins are also known and are 
exemplified in R. Kaufinan, Methods in Enzymology 1 85, 537-566 (1 990). As defined 
herein "operably linked" refers to linkage of a promoter upstream from a DNA sequence such 
that the promoter mediates transcription of the DNA sequence. Specifically, "operably 
linked" means that the isolated polynucleotide of the invention and an expression control 
sequence are situated within a vector or cell in such a way that the gene encoding the protein 
is expressed by a host cell which has been transformed (transfected) with the ligated 
polynucleotide/expression sequence. The teim "vector", refers to viral expression systems, 
autonomous self-replicating circular DNA (plasmids), and includes both expression and ' 
none}q>ression plasmids. 

The term "gene" as used herem is intended to refer to a nucleic acid sequence which 
encodes a polypeptide. This definition includes various sequence polymorphisms, mutations, 
and/or sequence variants wherein such alterations do not affect the fimction of the gene 
product. The term "gene" is intended to include not only coding sequences but also 
regulatoiy regions such as promoters, enhancers, tommation regions and similar untranslated 
nucleotide sequences. The term iurther includes all introns and other DNA sequences spliced 
from the mRNA transcript, along with variants resulting from alternative splice sites. 

A number of types of cells may act as suitable host cells for expression of the protein. 
Mammalian host cells include, for example, monkey COS cells, Chinese Hamster Ovary 
(CHO) ceUs, human kidney 293 cells, human epidermal A43 1 cells, human Col0205 cells, 
3T3 cells, CV-1 cells, other transformed primate cell lines, normal diploid ceUs, cell strains 
derived from JiLafio culture of primary tissue, primary explants. HeLa cells, mouse L cells. 
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BHK, HL- 60, U937, HaK or Jurkat cells. Alternatively, it may be possible to produce the 
protein in lower eukaryotes such as yeast or in prokaryotes such as bacteria. Potentially 
suitable yeast strains include Saccharomyces cerevisiae, Schizosaccharomyces pombe, 
Kluyveromyces strains, Candida or any yeast strain capable of expressing heterologous 
proteins. Potentially suitable bacterial strains include Escherichia coli, Bacillus subtilis, 
Salmonella typhimurium, or any bacterial strain capable of expressing heterologous proteins. 
If the protein is made in yeast or bacteria, it may be necessary to modify the protein produced 
therein, for example by phosphorylation or glycosylation of the appropriate sites, in order to 
obtain the functional protein. 

The protein may also be produced by operably linking the isolated polynucleotide of 
the invention to suitable control sequences in one or more insect expression vectors, and 
employing an insect expression system. Materials and methods for baculovims/insect cell 
expression systems are commercially available in kit form from, e.g., Invitrogen, San Diego, 
California, U.S.A. (the MaxBac© kit), and such methods are well known in the art, as 
described in Summers and Smith, Texas Agricultural Experiment Station Bulletin No. 1 555 
(1987). incorporated herein by reference. As used herein, an insect cell capable of 
expressing^a polynucleotide of the present invention is 'transformed.'' The protein of the 
invention may be prepared by culturing transformed host cells under culture conditions 
suitable to express the recombinant protein. 

The polymorphic protein of the invention may also be expressed as a product of 
transgenic animals, e.g., as a component of the milk of transgenic cows, goats, pigs, or sheep 
which are characterized by somatic or germ cells containing a nucleotide sequence 
encoding the protein. The protein may also be produced by known conventional chemical 
synthesis. Methods for constructing the proteins of the present invention by synthetic 
means are known to those skilled in the art. 

The polymorphic proteins produced by recombinant DNA technology may be purified 
by techniques commonly employed to isolate or purify recombinant proteins. Recombinantly 
produced proteins can be directly expressed or expressed as a fusion protein. The protein is 
then purified by a combination of cell lysis (e.g., sonication) and affinity chiomatpgraphy. 
For fusion products, subsequent digestion of the fusion protein with an appropriate 
proteolytic enzyme releases the desired polypeptide. The polypeptides of this invention may 
be purified to substantial purity by standard techniques well known in the art, including 
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selective precipitation with such substances as ammonium sulfete, column chromatography, 
immunopurification methods, and others. See, for instance, R. Scopes, Protein Purification: 
Principles and Practice, Springer-Verlag: New York (1982), incorporated herein by reference 
For example, in an embodiment, antibodies may be raised to the proteins of the invention as 
described hereia Cell membranes are isolated from a cell line expressing the recombinant 
protein, the protein is extracted from the membranes and immunoprecipitated. Hie proteins 
may then be forther purified by standard protein chemistry techniques as described above. 

The resulting expressed protein may then be purified from such culture (i.e., from 
ciilture medium or cell extracts) using known purification processes, such as gel filtration 
and ion exchange chromatography. The purification of the protein may also include an 
afBnity column containing agents which will bind to the protein; one or more column steps 
over such affinity resins as concanavalin A-agarose, heparin-Toyopeari® or Cibacrom blue 
3GA Sepharose B; one or more steps involving hydrophobic interaction chromatography 
using such resins as phenyl ether, butyl ether, or propyl ether; or immunoafBnity 
chromatography. Alternatively, the protein of the invention may also be expressed in a form 
which will facilitate purification. For example, it may be expressed as a fiision protein, such 
as those of maltose binding protein (MBP), glutathione-S-transfeiase (GST) or thioredoxin 
(TRX). Kits for expression and purification of such fiision proteins are commercially 
available from New England BioLab (Beverly, MA), Pharmacia (Piscataway, NJ) and 
InVitrogen, respectively. The protein can also be tagged with an epitope and subsequently 
purified by using a specific antibody directed to such epitope. One such epitope ("Flag'O is 
commerciaUy available from Kodak (New Haven, CT). Finally, one or more reverse-phase 
high perfomiance liquid chromatography (RP- HPLC) steps employing hydrophobic RP- 
HPLC media, e.g., silica gel having pendant methyl or other aliphatic groups, can be 
employed to further purify the protein. Some or all of the foregoing purification steps, in 
various combinations, can also be employed to provide a substantially homogeneous 
isolated recombinant protem. The protein thus purified is substantially free of other 
mammalian proteins and is defined in accordance with the present invention as an "isolated 
protein." 



The term "antibody" as used herein refers to immunoglobulin molecules and 
immunologically active portions of immunoglobulin molecules, i.e., molecules that contain 
an antigen binding site that specifically binds (immunoreacts with) an antigen, such as 
polymorphic. Such antibodies include, but are not limited to, polyclonal, monoclonal. 
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Chimeric, single chain, and ¥(_^2 fragments, and an expression library. In a specific 
embodiment, antibodies to human polymorphic proteins are disclosed. 

The phrase "specifically binds to", "immunospecifically binds to" or is "specificaUy 
immunoreactive with", an antibody when referring to a protein or peptide, refers to a binding 
reaction which is determinative of the presence of the protein in the presence of a 
heterogeneous population of proteins and other biological materials. Thus, for example, 
under designated immunoassay conditions, the specified antibodies bind to a particular 
protein and do not bind in a significant amount to other proteins present in the sample. 
Specific bindmg to an antibody under such conditions may require an antibody that is 
selected for its specificity for a particular protein. Of particular interest in the present 
invention is an antibody that binds immunospecifically to a polymorphic protein but not to its 
cognate wild type allelic protein, or vice versa. A variety of immunoassay formats may be 
used to select antibodies specifically immunoreactive with a particular protein. For example, 
solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies 
specifically immunoreactive with a protein. See Harlow and Lane (1988) Antibodies, a 
Laboratory Manual, Cold Spring Harbor Publications, New York, for a description of 
immunoassay formats and conditions that can be used 1o determine specific 
immunoreactivity. 

Polyclonal and/or monoclonal antibodies that immunospecifically bind to 
polymorphic gene products but not to the corresponding prototypical or "wild-type" gene 
products are also provided. Antibodies can be made by injecting mice or other animals with 
the variant gene product or synthetic peptide. Monoclonal antibodies are screened as are 
described, for example, in Harlow &Lane, Antibodies, A Laboratory Manual, Cold Spring 
Harbor Press, New York (1988); Coding, Monoclonal antibodies. Principles and Practice 
(2d ed.) Academic Press, New York (1986). Monoclonal antibodies are tested for specific 
immunoreactivity with a variant gene product and lack of immunoreactivity to the 
corresponding prototypical gene product. 

An isolated polymorphic protein, or a portion or fragment thereof, can be used as an 
immunogen to generate the antibody that binds the polymorphic protein using standard 
techniques for polyclonal and monoclonal antibody preparation. The fiill-length polymorphic 
protein can be used or, alternatively, the invention provides antigenic peptide fiagments of 
polymorphic for use as immunogens. The antigenic peptide of a polymorphic protein of the 
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invention comprises at least 8 amino acid residues of the amino acid sequence encompassing 
the polymorphic amino add and encompasses an epitope of the polymorphic protein such 
that an antibody raised against the pqjtide forms a specific inmiune complex with the 
polymorphic protein. Preferably, the antigenic peptide comprises at least 10 amino acid 
residues, more preferably at least 15 amino acid residues, even more preferably at least 20 
amino acid residues, and most preferably at least 30 amino add residues. Preferred epitopes 
encompassed by the antigenic peptide are regions of polymorphic that are located on the 
surface of the protein, e.g., hydrophilic regions. 

For the production of polydonal antibodies, various suitable host animals {e.g., rabbit, 
goat, mouse or otiier mammal) may be immunized by injection with the polymorphic protein.' 
An appropriate immunogenic preparation can contain, for example, lecombinantiy expressed 
polymorphic protein or a chemically syntiiesized polymorphic polypeptide. The preparation 
can further indude an adjuvant. Various adjuvants used to increase ihe immmiological 
response include, but are not limited to, Freund's (complete and incomplete), mineral gels 
ie.g., almninmn hydroxide), surfece active substances (e.g.. lysoledtiiin, pluronic polyols, • 
polyanions, peptides, oil emulsions, dmiti^phenol, etc.), human adjuvants such as Bacille 
Calmette-Guerin and Corynebacteriumpannm, or similar immunostimulatoiy agents. If 
desired, tiie antibody molecules directed against polymorphic protdns can be isolated from 
the mammal {e.g., from the blood) and further purified by well known techniques, such as 
protein A chromatography, to obtain the IgG fiction. 

The term "monoclonal antibody" or "monodonal antibody composition", as used 
herdn, refers to a population of antibody molecules tiiat originates from the clone of a singly 
hybridoma cell, and that contains only one type of antigen binding site capable of 
inununoreacting with a particular epitope of a polymorphic protein. A monoclonal antibody 
composition thus typically displays a single binding affinity for a particular polymorphic 
protein with which it immunoreacts. For prq)aration of monoclonal antibodies directed 
towards a particular polymorphic protein, or derivatives, fragments, analogs or homologs 
tiiereot any technique that provides for the production of antibody molecules by continuous 
cell line culture may be utilized. Such techniques include, but are not limited to, the 
hybridoma technique (see Kohler & Milstein, 1975 Nature 256: 495-497); the trioma 
technique; the human B-cell hybridoma tedmique (see Kozbor, et al., 1983 Immunol Today 
4: 72) and the EB V hybridoma technique to produce hmnan monoclonal antibodies (see Cole, 
et al, 1 985 In: Monoclonal Antibodies and Cancer Therapy, Alan R. Uss, Inc., pp. 
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77-96). Human monoclonal antibodies may be utilized in the practice of the present 
invention and may be produced by using human hybridomas (see Cote, et al. 1983. Proc 
mi Acad Set USA 80: 2026-2030) or by transfomiing human B-cells v«th Epstein Barr Virus 
in vitro (see Cole, et al. 1985 In: MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. 
Liss, Inc., pp. 77-96). 

According to the invention, techniques can be adapted for the production of 
single-chain antibodies specific to a polymorphic protein (see e.g., U.S. Patent No. 
4 946 778). hi addition, methodologies can be adapted for the construction of F.^ expression 
lilraries (see e.g.. Huse. et al, 1989 Sderu:e 246: 1275-1281) to allow rapid and effective 
identification of monoclonal Fab fragments with the desired specificity for a polymorphic 
protein or derivatives, fragments, analogs or homologs thereof. Non-human antibodies can 
be "humanized" by techniques weU known in the art. See e.g.. U.S. Patent No. 5.225,539. 
Antibody fragments that contain the idiotypes to a polymorphic protein may be produced by 
techniques known in the art including, but not limited to: (0 an F,^,2 fragment produced by 
pepsin digestion of an antibody molecule; («) an Fab fragment generated by reducing the 
disulfide bridges of an T,^^ fragment; Qif) an Fab fragment generated by the treatment of the 
antibody molecule with papain and a reducing agent and (iv) Fv fragments. 

Additionally, recombinant anti-polymorphic protem antibodies, such as chimeric and 
humanized monoclonal antibodies, comprising both hmnan and non-human portions, which 
can be made using standard recombinant DNA techniques, are within the scope of the 
invention. Such chimeric and humanized monoclonal antibodies can be produced by 
recombinant DNA techniques known in the art, for example using methods described in PCT 
International ApphcationNo. PCT/US86/02269; European Patent Application No. 184,187; 
European Patent Application No. 171,496; European Patent Application No. 173^494;PCT 
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In one embodiment, methodologies for the screening of antibodies that possess the 
desired specificity include, but are not limited to, enzyme-linked immunosorbent assay 
(ELISA) and other immunologicaUy-mediated techniques known within the art. 

Anti-polymoiphic protein antibodies may be used in metiiods known within the art 
relating to the detection, quantitation and/or cellular or tissue localization of a polymorphic 
protein ie.g.. for use in measuring levels of the polymorphic protein witiun appropriate 
physiological samples, for use in diagnostic methods, for use in imaging tixe protein, and the 
like), m a given embodim^it, antibodies for polymoiphic proteins, or derivatives, fragments 
analogs or homologs tiiereof, that contam the antibody-derived CDR, are utihzed as 
pharmacologically-active compomids in therapeutic applications intended to treat a pathology 
m a subject that arises from the presence of tiie cSNP allele in tiie subject. 

An anti-polymorphic protein antibody {e.g., monoclonal antibody) can be used to 
isolate polymorphic proteins by a variety of immunochemical techniques, such as 
inununoaffinity chromatography or immunoprecipitation. An anti-polymorphic protein 
antibody can facilitate the purification of natural polymorphic protein from cells and of 
recombinantiy produced polymorphic proteins expressed in host cells. Moreover, an 
anti-polymorphic protein antibody can be used to detect polymorphic protein (e.g , in a 
cellular lysate or ceU supernatant) in order to evaluate the abundance and pattern Jf 
expression of tiie polymorphic protein. Anti-polymo*phic antibodies can be used 
diagnostically to monitor protein levels in tissue as part of a clinical testing procedure . g 
to, for example, determine tiie efiScacy of a given treatinent regimen. Detection can be 
fecilitated by coupling (Le., physically linking) tiie antibody to a detectable substance 
Examples of detectable substances include various enzymes, prostiietic groups, fluorescent 
matenals, luminescent materials, biolmninescent materials, and radioactive materials 
Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase 
-g alactosidase, or acetylcholinesterase; examples of suitable prostiietic group complexes 
mclude streptavidin^iotin and avidin^iotin; examples of suitable fluorescent materials 
mclude umbeUiferone, fluorescein, fluorescein isotiiiocyanate, rhodamine, 
dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an eJcample of a 
lummescent material includes Imninol; examples of bioluminescent materials include 
lucif^e, luciferin, and aequorin, and examples of suitable radioactive material include '^^ 
I, ^'S or ^. 
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Equivalents 

From the foregoing detaUed description of the specific embodiments of the invention, 
it should be apparent that imique compositions and methods of use thereof in SNPs in known 
genes have been described. Although particular embodhnents have been disclosed herein in 
detail, this has been done by way of example for purposes of illustration only, and is not 
intended to be limiting with respect to the scope of the appended claims which follow. In 
particular, it is contemplated by the inventor that various substitutions, alterations, and 
modifications may be made to the invention without departing from the sphit and scope of 
the invention as defined by the claims. 
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What is claimed is: 

1 An isolated polynucleotide selected from the group consisting of: 

a) a nucleotide sequence comprising one or more polymorphic sequences 
selected from the group consisting of SEQ ID N0S:1 - 7867; 

b) a fragment of said nucleotide sequence, provided that the fragment 
includes a polymorphic site in said polymorphic sequence; 

c) a complementary nucleotide sequence comprising a sequence 
complementary to one or more of said polymorphic sequences selected 
from the group consisting of SEQ ID NOS:l-7867; and 

d) a fragment of said complementary nucleotide sequence, provided that the 
fragment includes a polymorphic site m said polymorphic sequence. 



15 



20 



25 



2. 



3. 



4. 



5. 



7. 



30 8. 



Tlie polynucleotide of claim 1, wherein said polynucleotide sequence is DNA. 

The polynucleotide of claim 1, wherein said polynucleotide sequence is RNA. 

The polynucleotide of claim 1 . wherein said polynucleotide sequence is between 
about 10 and about 100 nucleotides in length. 

The polynucleotide of claim 1 , wherein said polynucleotide sequence is between 
about 10 and about 90 nucleotides in lei^. 



6. Hie polynucleotide of claim 1, wherein said polynucleotide sequence is between 
about 10 and about 75 nucleotides in length. 



Tte polynucleotide of claim 1. wherein said polynucleotide is between about 10 and 
about 50 bases in length. 

The polynucleotide of claim 1. wherein said polynucleotide is between about lOand 
about 40 bases in length. 
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9. 



10. 



11. 



The polynucleotide of claim 1, wherein said polynucleotide is between about 15 and 
about 30 bases in length. 

The polynucleotide of claim 1, wherein said polymorphic site includes a nucleotide 
other than the nucleotide listed in Table 1, column 5 for said polymorphic sequence. 

The polynucleotide of claim 1 , wherein the complement of said polymorphic site 
includes a nucleotide other than the complement of the nucleotide listed in Table 1 , 
column 5 for the complement of said polymorphic sequence. 

12. The polynucleotide of claim 1 , wherein said polymorphic site includes the nucleotide 
listed in Table 1, column 6 for said polymorphic sequence. 

13. The polynucleotide of claim 1, wherein the complement of said polymorphic site 
includes the complement of the nucleotide listed in Table 1 , column 6 for said 
polymorphic sequence. 



14. 



An isolated aUele-specific oligonucleotide that hybridizes to a first polynucleotide at a 
polymorphic site encompassed therein, wherein the first polynucleotide is selected 
fiom the group consisting of: 

a) a nucleotide sequence comprising one or more polymorphic sequences 
selected fiom the group consisting of SEQ ID NOS:l - 7867 provided that 
the polymorphic sequaice mcludes a nucleotide other than the nucleotide 
recited in Table 1, column 5 for said polymorphic sequence; 

b) a nucleotide sequence that is a fiagment of said polymorphic sequence, 
provided that the fragment includes a polymorphic site in said polymorphic 
sequence; 

c) a complementary nucleotide sequence comprising a sequence 
complementary to one or more polymorphic sequences selected from Ihe 
group consisting of SEQ ID N0S:1 - 7867, provided that the 
complementary nucleotide sequence includes a nucleotide other than the 
complement of the nucleotide recited in Table 1. column 5; and 
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d) a nucleotide secpience that is a fragment of said complementary sequence, 
provided that the fragment includes a polymorphic site in said polymorphic 
sequence. 



15. 



16. 



The oligonucleotide of claim 1 4, wherdn the oligonucleotide does not hybridize 
under stringent conditions to a second polynucleotide selected from the group 
consisting of: 

a) a nucleotide sequence comprising one or more polymorphic sequences 
selected from tiie group consisting of SEQ ID N0S:1 - 7867, wherein said 
polymorphic sequence includes the nucleotide listed in Table 1, column 5 
for said polymorphic sequence; 

b) a nucleotide sequence that is a fragment of any of said nucleotide 

sequences; 

c) a complementary nucleotide sequence comprising a sequence 
complementary to one or more polymorphic sequences selected from the 
group consisting of SEQ ID N0S:1 - 7867, wherein said polymorphic 
sequence includes tiie complement of the nucleotide hsted in Table 1 , 
colimin 5; and 

d) a nucleotide sequence tiiat is a fragment of said complementary sequence, 
provided fliat tiie fragment includes a polymorphic site in said polymorphic 

sequence. 

The oligonucleotide of claim 15, wherein tiie oligonucleotide is between about 10 and 
about 51 bases in length. 



17. nie oligonucleotide of claim 15, wherein tiie oligonucleotide is between about 10 and 
about 40 bases in lengtii. 



18. 



19. 



The oligonucleotide of claim 15, wherein tiie oligonucleotide is between about 15 and 
about 30 bases in length. 

A method of detecting a polymorphic she in a nucleic acid, tiie metiiod comprising: 
a) contacting said nucleic acid witii an oligonucleotide tiiat hybridizes to a 
polymorphic sequence selected from tiie group consisting of SEQ ID NOS: 
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20. 



21. 



22. 
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1-7867, or its complement, provided that the polymorphic sequence 
includes a nucleotide other than the nucleotide recited in Table 1, column 5 
for said polymoiphic sequence, or the complement includes a nucleotide 
other than the complement of the nucleotide recited in Table 1 , column 5; 



and 



b) determining whether said nucleic acid and said oligonucleotide hybridize; 
whereby hybridization of said oligonucleotide to said nucleic acid sequence indicate 
the presence of the polymorphic site in said nucleic acid. 

The method of claim 19, wherein said oligonucleotide does not hybridize to said 
polymorphic sequence when said polymorphic sequence includes the nucleotide 
recited in Table 1. cohmm 5 for said polymorphic sequence, or when the complement 
of the polymoiphic sequence includes the complement of the nucleotide recited in 
Table 1, column 5 for said polymorphic sequence. 

nie method of claim 19, wherein said oligonucleotide is between about 10 and about 
51 bases in length. 

■ 

The method of claim 19, wherein said oligonucleotide is between about 1 0 and about 
40 bases in length. 



23. 



A method of detecting the presence of a sequence polymorphism in a subject, the 
method comprising: 

a) providing a nucleic acid from said subject; 

b) contacting said nucleic acid with an oligonucleotide that hybridizes to a 
polymorphic sequence selected from the gcoup consisting of SEQ ID NOS: 
1-7867, or its complement, provided that the polymorphic sequence 
includes a nucleotide other than the nucleotide recited in Table 1. column 5 
for said polymorphic sequence, or the complement includes a nucleotide 
other than the complement of the nucleotide recited in Table 1 , column 5; 



and 



c) determining whether said nucleic acid and said oligonucleotide hybridize; 
whereby hybridization of said oligonucleotide to said nucleic acid sequence indicates 
the presence of the polymorphism in said subject. 
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24. A method of determining the relatedness of a first and second nucleic add, the 

method comprising: 

a) providing a first nucleic acid and a second nucleic acid; 

b) contacting said first nucleic acid and said second nucleic acid with an 
oligonucleotide that hybridizes to a polymorphic sequence selected firom 
the group consisting of SEQ ID NOS: 1-7867, or its complement, provided 
that the polymorphic sequence includes a nucleotide other than the 
nucleotide recited in Table 1, column 5 for said polymorphic sequence, or 
the complement includes a nucleotide other than the complement of the 
nucleotide recited in Table 1, column 5; 

c) determining whether said first nucleic acid and said second nucleic acid 
hybridize to said oligonucleotide; and 

d) comparing hybridization of said first and second nucleic acids to said 
oligonucleotide, wherein hybridization of first and second nucleic acids 
to said nucleic acid indicates the first and second subjects are related. 



25. 



26. 



28. 



29. 



The method of claim 24, wherein said oligonucleotide does not hybridize to said 
polymorphic sequence when said polymorphic sequence includes the nucleotide 
recited in Table 1, column 5 for said polymorphic sequence, or when the complement 
of the polymorphic sequence mcludes the complement of the nucleotide recited in 
Table 1, column 5 for said polymorphic sequence. 

The method of claim 24, wherein the oligonucleotide is between about 10 and about 
51 bases in lei^. 



27. The method of claim 24, wherein the oligonucleotide is between about 10 and about 
40 bases in length. 



The method of claim 24, wherein the oligonucleotide is between about 15 and about 
30 bases in length. 

An isolated polypeptide comprising a polymorphic site at one or more amino acid 
residues, wherein the protein is encoded by a polynucleotide selected firom the group 
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consisting of polymoiphic sequences SEQ ID NOS: 1-7867, or their complement, 
provided that the polymorphic sequence includes a nucleotide other than the 
nucleotide recited in Table 1, column 5 for said polymorphic sequence, or the 
complement includes a nucleotide other than the complement of the nucleotide recited 
in Table 1, column 5. 



30. The polypeptide of claim 29, wherein said polypeptide is translated in the same open 
reading frame as is a wild type protein whose ammo acid sequence is identical to the 
amino acid sequence of the polymorphic protein except at the site of the 
polymorphism. 



31. 



32. 



33. 



34. 



35. 



The polypeptide of clahn 29, wherein die polypeptide encoded by said polymorphic 
sequence, or its complement, includes the nucleotide listed in Table 1, column 6 for 
said polymorphic sequence, or the complement includes the complement of the 
nucleotide listed in Table 1, column 6. 

An antibody that binds specifically to a polypeptide encoded by a polynucleotide 
comprising a nucleotide sequence selected from the group consisting of polymorphic 
sequences SEQ ID NOS: 1 -7867. or its complement, provided that the polymorphic 
sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 
for said polymorphic sequence, or the complement includes a nucleotide other than 
the complement of the nucleotide recited m Table 1, column 5. 

The antibody of claim 32, wherein said antibody binds specifically to a polypeptide 
encoded by a polymorphic sequence which includes the nucleotide listed in Table 1, 
column 6 for said polymorphic sequence. 

The antibody of claim 32, wherein said antibody does not bind specifically to a 
polypeptide encoded by a polymorphic sequence which includes the nucleotide listed 
in Table 1, column 5 for said polymorphic sequence. 

A method of detectmg the presence of a polypeptide having one or more amino acid 
residue polymorphisms in a subject, the method comprising 
a) providing a protein sample from said subject; 
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b) contacting said sample with the antibody of claim 34 under conditions that 
allow for the formation of antibody-antigen complexes; and 

c) detecting said antibody-antigen complexes, 

whereby the presence of said complexes indicates the presence of said polypeptide. 

36, A method oftieating a subject suffering from, at risk for. or suspected of, suf^^^ 

from a pathology ascribed to the presence of a sequence polymorphism in a subject, 

the mediod cOTiprising: 

a) providing a subject suffering from a pathology associated with aberrant 
JO expression of a &st nucleic acid comprising a polymorphic sequence 

selected from the group consisting of SEQ ID N0S:1 - 7867. or its 

complement; and 

b) administering to the subject an effective therapeutic dose of a second 
nucleic acid comprising the polymorphic sequence, provided that the 
second nucleic acid comprises the nucleotide present in the wild type allele, 

thereby tieatii^ said subject 



15 



37 The method of claim 36, wherein the second nucleic acid sequence comprises a 

polymorphic sequence which includes the nucleotide listed in Table 1, column 5 for 
20 said polymorphic sequence. 



38. 



A method of treating a subject suffering from, at risk for, or suspected of, suffering 
from a pathology ascribed to the presence of a sequence polymorphism in a subject, 

the method comprising: 
j5 a) providing a subject suffering from a pathology associated with aberrant 

expression of a polymorphic sequence selected from the group 
consisting of polymorphic sequences SEQ ID NOS:l - 7867, or its 

complement; and 
b) administering to the subject an eflfective therapeutic dose of a 

30 polypeptide, 

wherein said polypeptide is encoded by a polynucleotide comprising a polymorphic 
sequence selected from the group consisting of SEQ ID N0S:1 - 7867, or by a 
polynucleotide comprising a nucleotide sequence that is complementary to any one of 
polymorphic sequences SEQ ID N0S:1 - 7867, provided that said polymorphic 
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sequence includes the nucleotide listed in Table 1, column 6 for said polymorphic 
sequence. 



39. 



40. 



A method of treating a subject suffering from, at risk for, or suspected of suffering 
from, a pathology ascribed to the presence of a sequence polymorphism in a subject, 
the method comprising: 

a) providing a subject suffering from, at risk for, or suspected of suffering 
from, a pathology associated with aberrant expression of a first nucleic acid 
comprising a polymorphic sequence selected from the group consisting of 
SEQ ID NOS. l - 7867, or its complement; and 

b) admmistering to the subject an effective dose of the antibody of claim 34, 
thereby treating said subject 

A method of treating a subject suffering from, at risk for, or suspected of suffering 
from, a pathology ascribed to the presence of a sequence polymorphism in a subject, 
the method comprising: 

a) providing a subject suffering from, at risk for, or suspected of suffering 
from, a pathology associated with aberrant expression of a nucleic acid 
comprising a polymorphic sequence selected from the group consisting of 
SEQ ID NOS: 1 - 7867, or its complement; and 

b) administering to the subject an effective dose of an oligonucleotide 
comprising a polymorphic sequence selected fix>m the group consisting of 
SEQ ID N0S:1 - 7867. or by a polynucleotide comprising a nucleotide 
sequence tiiat is complementary to any one of polymorphic sequences SEQ 
ID N0S:1 - 7867, provided that said polymorphic sequence includes the 
nucleotide listed in Table 1, column 5 or Table 1, column 6 for said 
polymorphic sequence, 

thereby treating said subject. 

41 . An oligonucleotide array, comprising one or more oligonucleotides hybridizing to a 
first polynucleotide at a polymorphic site encompassed therein, wherein the fust 
polynucleotide is chosen from the group consisting of: 

a) a nucleotide sequence comprising one or more polymorphic sequences 
selected from the group consisting of SEQ ID NOS. l - 7867; 
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b) a nucleotide sequence that is a fragment of any of said nucleotide sequence, 
provided that the fragment inchides a polymorphic site in said polymorphic 
sequence; 

c) a complementary nucleotide sequence comprising a sequence 
complementary to one or more polymorphic sequences selected from the 
group consisting of SEQ ID NOS: 1 - 7867; and 

d) a nucleotide sequence that is a fragment of said complementary sequence, 
provided that the fragment includes a polymorphic site in said polymorphic 
sequence. 

42. The array of claim 41 , wherein said array comprises about 1 0 oligonucleotides. 

43. .Hie airay of claim 41, wherein said anay comprises about 100 oligonucleotides. 



15 44. The array 



of claim 41, wherein said array comprises about 1000 oligonucleotides, 
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<400> 1397 

aaaatcagat ccgaggcttg tttttccttg tctagatatg ttttaaaaga 50 



<210> 1398 
<2t^ 51 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> allele 
<222> (26)... (0) 

<223> single nucleotide polymorphism 

<221> misc_feature 
<222> (0) . . . (0) 

<223> Accession number cg43094267 
<400> 1398 

acagaagaaa ctacgcaaaa aaagtttgaa gtcatgcaaa ctcctacttt a 51 

<210> 1399 

<211> 51 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> (26)... (0) 

<223> single nucleotide polymorphism 

<221> misc_feature 
<222> (0) . - . (0) 

<223> Accession number cg43950470 
<400> 1399 

attttaactt agagcttttt ttttttaatt ttgtctgccc caagttttgt g 51 

<210> 1400 

<211> 51 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> (26) ... (0) 

<223> single nucleotide polymorphism 

<221> misc_feature 
<222> (0) . . . (0) 

<223> Accession number cg43931615 
<400> 1400 

agacacctga gctcactggt gaactttgct tcaagtcctc ctgcaaagca c 51 

<210> 1401 

<211> 50 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> (26) ... (0) 

<223> single nucleotide polymorphism 
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